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(57) Abstract 

A mediod is provided for immobilizing a bindinf protein capable of Mndiqg to a spedfic compound, usi^g rccombinaat DNA 
techniques for producing said binding protein or a functional part thereof. The bindiiig protcio is immobUtzed by producing it as part of 
a dnmeric protein also compriung an anchoring part derivable from die C-tmninal pan of an anchoriag protein, thereby ensuring tliat 
the binding protein is localized in or at the exterior of the cell wall of the host cell. Suitable anchoiiiig proteins are yeast oe-aggludnin, 
FLOl (a pioiein associated with tbe flocculadon phenotype in S. cerevlsiae), the Major Cell Wall ProtBin of lowo' eukaiyotes, and a 
proteinase of lactic add bacteria. For secretian the chiznoic protein can comprise a signal peptide including those of a-madng foctor 
of yeast, a -agglutinin of yeast, invertase of Saccharomyces, inulinase <sS Kluyveromyces, a-amylase of BacUha, and proteinase of lactic 
add bacteria. Also provided are itcomfainant polynucleotides encoding such chimeric protein, vectors ccKOprising such polynucleotide, 
transformed microoiganians having such chimeric protein immobilized on their cell wall, and a process for can^g out an isolatioii process 
by uang such transfonned ho$t, whexran a mediimi cantaining sad spedfic compound is contacted with such host cell to form a conqilex, 
s^arating said ccnnplex frran the medium and, optionally, releasing said spedfic compound from said bintUog protein. 
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Title: Immobilized proteins with specific binding capacities and their use in 
processes and products 

Background of the invention 

5 The pharmaceutical, the fine chemicals and the food industry need a number of 
compounds that have to be isolated from complex mixtures such as extracts of 
animal or plant tissue^ or fermentation broth. Often these isolation processes 
determine the price of the product. 

Conventional isolation processes are not very specific and during the isolation 
10 processes the compound to be isolated is diluted considerably with the consequence 
that expensive steps for removing water or other solvents have to be applied. 

For the isolation of some specific compounds affinity techniques are used. The 
advantage of these techniques is that the compounds bind very specifically to a 

IS certain ligand. However these ligands are quite often very expensive. 

To avoid spillage of these expensive ligands they can be [inked to an insoluble 
support However, often linking the ligand is also expensive and, moreover, the 
functionality of the ligand is often affected negatively by such procedure. 
So a need exists for developing cheap processes for preparing highly effective 

20 immobilized ligands. 

Summaiy of the invention 

The invention provides a method for immobilizing a binding protein capable of 
binding to a specific compound, comprising the use of recombinant DNA techniques 

25 for producing said binding protein or a functional part thereof still having said 
specific binding capability, said protein or said part thereof being linked to the 
outside of a host cell, whereby said binding protein or said part thereof is localized 
in the cell wall or at the exterior of the cell wall by allowing the host cell to produce 
and secrete a chimeric protein in which said binding protein or said functional part 

30 thereof is bound with its C-terminus to the N-terminus of an anchoring part of an 
anchoring protein capable of anchoring in the cell wall of the host cell, which 
anchoring part is derivable from the C-lerminal part of said anchoring protein. 



SUBSTITUTE SHEET (RULE 26) 
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Preferably, the hosi is selected from Gram-positive bacteria and fungi, which have a 
cell wall at the outside of the host cell, in contrast to Gram-negative bacteria and 
cells of higher eukaryotes such as animal cells and plant cells, which have a 
membrane at the outside of their cells. Suitable Gram-positive bacteria comprise 
5 lactic acid bacteria and bacteria belonging to the genera Bacillus and Streptomyces. 
Suitable fungi comprise yeasts belonging to the genera Candida^ Debaryomyces, Han- 
senula, Kluyveromyces, Fichia and Sacdiaromyces, and moulds belonging to the 
genera Aspergillus, Penicillium and Rhizopus, In this specification the group of fungi 
comprises the group of yeasts and the group of moulds, which are also known as 

10 lower eukaryotes. In contrast to the cells in plants and animals, the group of bacteria 
and lower eukaryotes are also indicated in this specification as microorganisms. 
The invention also provides a recombinant polynucleotide capable of being used in a 
method as described above, such polynucleotide comprising (i) a structural gene 
encoding a binding protein or a functional part thereof still having the specific 

15 binding capability, and (ii) at least part of a gene encoding an anchoring protein 
capable of anchoring in the cell wall of a Gram-positive bacterium or a fungus, said 
part of a gene encoding at least the anchoring part of said anchoring protein, which 
anchoring part is derivable from the Oterminal part of said anchoring protein. 
The anchoring protein can be selected from a -agglutinin, a-agglutinin, FLOl, the 

20 Major Cell Wall Protein of a lower eukaryote, and proteinase of lactic acid bacteria. 
Preferably, such polynucleotide further comprises a nucleotide sequence encoding a 
signal peptide ensuring secretion of the expression product of the polynucleotide, 
which signal peptide can be derived from a protein selected from the a-mating 
factor of yeast, a -agglutinin of yeast, invertase of Saccharomyces, inulinase of 

25 KJuyveromyceSy a -amylase of Bacillus, and proteinase of lactic add bacteria. The 
polynucleotide can be operably linked to a promoter, which is preferably an 
inducible promoter. 

The invention further provides a recombinant vector comprising a polynucleotide 
according to the invention, a chimeric protein encoded by a polynucleotide 
30 according to the invention, and a host cell having a cell wall at the outside of its cell 
and containing at least one polynucleotide according to the invention. Preferably at 
least one polynucleotide is integrated in the chromosome of the host cell. Another 
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embodiment of this part of the invention is a host eel) having a chimeric protein 
according to the invention immobilized in its cell wall and having the binding 
protein part of the chimeric protein localized in the cell wall or at the exterior of 
the cell wall. 

5 Another embodiment of the invention is a process for carrying out an isolation 
process by using an immobilized binding protein or functional part thereof still 
capable of binding to a specific compound, wherein a medium containing said 
specific compound is contacted with a host celJ according to the invention under 
conditions whereby a complex between said specific compound and said immobilized 
10 binding protein is formed, separating said complex from the medium originally 
containing said specific compound and, optionally, releasing said specific compound 
from said binding protein or functional part thereof. 

Brief description of the figures 
15 In Figure I the composition of pEMBL9-derived plasmid pUR4122 is indicated, the 
preparation of which is described in Bxample 1. 

In Flgare 2 the composition of plasmid pUR2741 is indicated, which is a derivative 
of published plasmid pUR2740, see Example 1. 

In Figure 3 the composition of pEMBL9-derived plasmid pUR2968 is indicated. Its 

20 preparation is described in Example 1. 

In Figure 4 the preparation of plasmid pUR4174 starting from plasmids pUR2741, 
pUR2968 and pUR4122 is indicated, as well as the preparation of plasmid pUR4175 
starting from plasmids pSY16, pUR2968 and pUR4122. These preparations are 
described in lExample 1. 

25 In Figure 5 the composition of plasmid pUR2743.4 is indicated. Its preparation is 
described in Example 2. It contains the 714 bp Pstl-Xhol fragment given in 
SEQ ID NO: 12, which fragment encodes an scFv-TRAS fragment of anti-traseolide® 
antibody 02/01/01- 

In Figure 6 the composition of plasmid pUR4178 is indicated. Its preparation is 
30 indicated in Example 2. It contains the above mentioned 714 bp Pstl-Xhol fragment 
given in SEQ ID NO; 12. This plasmid is suitable for the expression of a fusion 
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protein between scFv-TRAS and aAGG preceded by the invenase signal sequence 
(SUC2). 

In Figure 7 the composition of piasmid pUR4179 is indicated. Its preparation is 
indicated in Example 2. tt contains the above mentioned 714 bp Pstl-Xhol fragment 
5 given in SEQ ID NO: 12. This piasmid is suitable for the expression of a fusion 
protein between scFv-TRAS and aAGG preceded by the prepro-a-mating factor 
signal signal sequence. 

In Figure 8 a molecular design picture is given, showing the musk odour molecule 
traseolide® and a modified musk antigen, described in Example 3. 

10 In Figure 9 the composition of piasmid pUR4177 is indicated. Its construction is 
described in Example 4. Piasmid pUR4177 contains the 734 bp Eagl-Xhol DNA 
fragment given in SEQ ID NO: 13 encoding the variable regions of the heavy and 
light chain fragments from the monoclonal antibody directed against the human 
chorionic gonadotropin (an scFv-HCG fragment) and is a 2 ^m-based vector 

15 suitable for production of the chimeric scFv HCG-aAGG fusion protGin preceded by 
the invertase signal sequence and under the control of the GAL7 pTomotcr. 
In Figure 10 the composition of piasmid pUR4180 is indicated. Its preparation is 
indicated in Example 4. It contains the above mentioned 734 bp Ee^l-Xhol DNA 
. fragment given in SEQ ID NO: 13 and is a 2 fim-based vector suitable for 

20 production of the chimeric scFv-HCG-aAGG fusion protein preceded by the prepro- 
a-mating factor signal sequence and under the control of the GAL7 promoter. 
In Figure 1 1 the composition of piasmid pUR2990» a 2 tim-based veaor, is 
indicated, which is suggested in Example 5 as a starting vector for the preparation of 
piasmid pUR4196 (see Figure 12). FJasmid pUR2990 contains a DNA fragment 

25 encoding a chimeric lipase-FLOl protein that will be anchored in the cell wall of a 
lower eukaryote and can catalyze lipid hydrolysis. 

In Figure 12 the composition of piasmid pUR4196 is indicated. Its preparation is 
explained in Example 5. It contains a DNA fragment encoding a chimeric protein 
comprising the scFv-HCG followed by the C-terminal part of the FLOl-protein, and 
30 is a vector suitable for the production of a chimeric protein anchored in the cell wall 
of the host organism and can bind HCG. 
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In Figure 13 the composition of plasmid pUR2985 is indicated. Its preparation is 
described incExample 6. li contains a c/io B gene coding for the mature part of the 
cholesterol oxidase (EC 1,1.3.6) ohiained via PCR techniques from the chromosome 
of Brsvibacteriunt sterolicum. 
5 In Figure 14 the composition of plasmid pUR2987 is indicated. Its preparation from 
plasmid pUR2985 is described in Example 6. It contains a DNA sequence 
comprising the chofi gene coding for the mature part of the cholesterol oxidase 
preceded by DNA encoding the prep ro- a -mating factor signal sequence and 
followed by DNA encoding the C-terminaJ part of a -agglutinin. 
10 In Figure IS the composition of the published plasmid pGKV550 is indicated. It is 
described in Example 7 and contains the complete cell wall proteinase operon of 
Lactococcus lactis subsp* cremoris Wg2, including the promoter, the ribosome 
binding site and the prtP gene. 

In Figure 16 the composition of plasmid pUR2988 is indicated. Its preparation is 

15 described in Example 7. It is anticipated that this plasmid can be iised for preparing 
a further plasmid pUR2989, which after introduction in a lactic add bacterium will 
be responsible for producing a chimeric protein that will be anchored at the outer 
surface of the lactic add bacterium and is capable of binding cholesterol. 
In Figure 17 the composition of plasmid pUR2993 is indicated. Its preparation is 

20 described in Example 8. It is anticipated that this plasmid can be used for 

transforming yeast cells that can bind a human epidermal growth factor (EGF) 
through an anchored chimeric protein containing an EGF receptor. 
In Figure 18 the composition of plasroids pUR4482 and 4483 is indicated. Their 
preparation is described in Example 9. Plasmid pUR4482 is a yeast episomal 

25 expression plasmid for expression of a fusion protein with the invertase signal 

sequence, the CHyOP variable region, the Myc-tail, and the "X-P-X-P" Hinge region 
of a camel antibody, and the a -agglutinin cell wall anchor region. Plasmid pUR4483 
differs from pUR4482 in that ir does not contain the "X-P-X-P" Hinge region. 
In Figure 19 immunofluorescent labelling (anti-Myc antibody) of SUlO cells in the 

30 exponential phase (OD53q=0.5) expressing the genes of camel antibodies present on 
plasmids pUR4424, pUR4482 and pUR4483 is shown. 
Ph = phase contrast, Fl = fluorescence. 
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In Figure 20 immunofluorescent labelling (anti-human IgG antibody) of SVIO ceils 
in the exponential phase (OD530-O.5) expressing the genes of camel antibodies 
present on plasmids pUR4424, pUR4482 and pUR4483 is shown. 
Ph = phase contrast, FI = fluorescence. 



Abbreviations used in the Figures : 



15 



a -gal: 

AG-alpha-l/AGal: 
AGttl cds/a-AGG: 
10 Amp/amp r: 
CHv09: 
EmR: 
fl: 

FLOl/FLO (C-part): 

Hinge: 
LEU2: 

LEU2d/Leu2d: 
Leu 2d cs: 
20 MycT: 
On MBl: 
Pgal7/pGAL7; 
Tpgk: 

ppa-MF/MFa Iss: 
25 repA: 



ScFv (Vh-Vl): 



ss: 



SUC2: 
30 2u/2 micron; 



gene encoding guar a-galactosidase 

gene expressing o -agglutinin from cerevisiae 

coding sequence of a-agglutinin 

B-lactamase resistance gene 

camel heavy chain variable 09 fragment 

eiytbromycin resistance gene 

phage f 1 replication sequence 

C-tenninal part of FLOl coding sequence of floccuiation 
protein 

Camel "X-P-X-F" Hinge region, see Example 9 
LEU2 gene 
truncated LEU2 gene 
codii^ sequence LEU2d gene 
camel Myc-tail 

origin of replication MBl derived from R coU plasmid 
GAL7 promoter 

terminator of the phosphoglycerateldnase gene 

prepro-part of a -mating factor (= signal sequence) 

gene encoding the repA protein required for replication (Fig. 

15/16). 

single chain antibody fragment containing and V^^ chains 
signal sequence 
invertase signal sequence 
2 vim sequence 
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Dctnilcd description of the invention 

The present invention relates to the isolation of valuable compounds from complex 
mixtures by making nse of immobilized ligands. The immobilized ligands can be 
proteins obtainable via genetic engineering and can consist of two parts, namely 
5 both an anchoring protein or functional part thereof and a binding protein or 
functional part thereof. 

The anchoring protein sticks into cell walls of microorganisms, preferably lower 
eukaryotes, e.g. yeasts and mouJds. Often this type of proteins has a long C-terminal 
part that anchors it in the cell wall. These C-terminal parts have very special amino 
acid sequences. A typical example is anchoring via C-terminal sequences of proteins 
enriched in proline, see Kok (1990). 

The C-terminal pan of these anchoring proteins can contain a substantial number of 
potential serine and threonine glycosylation sites. O-glycosylation of these sites gives 
a rod-like conformation to the C-terminai part of these proteins. 
In the case of anchored manno-proteins they seem to be linked to the glucan in the 
cell wall of lower eukaiyotes, as they cannot be extracted from the cell wall with 
sodium dodecy] sulphate (SDS), but can be liberated by glucanase treatment, see 
our co-pending patent application WO-94/01S67 (UNILEVER) published 20 January 
1994 and Schreuder cs. (1993), both being published after the claimed priority date. 
Another mechanism to anchor proteins at the outer side of a cell is to make use of 
the property that a protein containing a glycosyl-phosphatidyl-inositol (GPI) group 
anchors via this GPI group to the cell surface^ see Conzelmann cs. (1990). 

25 The binding protein is so called, because it ligaies or binds to the specific compound 
to be isolated. If the N-terminal part of the anchoring protein is sufficiently capable 
of binding to a specific compound, the anchoring protein itself can be used in a. 
process for isolating that specific compound. Suitable examples of a binding protein 
comprise an antibody, an antibody fragment, a combination of antibody fragments, a 

30 receptor protein, an inactivated enzyme stiii capable of binding the corresponding 
substrate, and a peptide obtained via Applied Molecular Evolution, see Lewin 
(1990), as well as a part of any of these proteinaceous substances still capable of 
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e 

binding to the specific compound to be isolated. All these binding proteins are 
characterized by specific recognition of the compounds or group of related 
compounds to be isolated. The binding rate and release rate, and therefore the 
binding constant between rhe specific compound to be isolated and the binding 
5 protein, can be regulated either by changing the composition of the liquid extract in 
which the compound is present or, preferably, by changing the binding protein by 
protein engineering. 

The gene coding for the chimeric protein comprising both the binding protein and 

10 the anchoring protein (or functional parts thereof) can be placed under control of a 
constitutive, inducible or derepressible promoter and will generally be preceded by a 
DNA fragment encoding a signal sequence ensuring efficient secretion of the 
chimeric protein. Upon secretion the chimeric protein will be anchored in the cell 
wall of the microorganisms, thereby covering the surface of the microorganisms with 

15 the chimeric protein. These microorganisms can be obtained in normal fermentation 
processes and their isolation is a cheap process, when physical separation processes 
are used, e.g. centrifugation or membrane filtration. 
After washing, the isolated microorganisms can be added to liquid extracts 
containing the valuable specific compound or compounds. After some time the 

20 equilibrium between the bound and free specific compound($) will be reached and 
the microorganisms to which the specific compound or group of related compounds 
is bound can be separated from the extract by simple physical techniques. 
Alternatively, the microorganisms covered with ligands can be brought on a support 
material and subsequently this coated support material can be used in a column. 

25 The liquid extract containing the specific compound or compounds of interest can be 
added to the column and afterwards the compound(s) can be released from the 
ligand by changing the composition of the eluting liquid or the temperature or both. 
A skilled person will recognize that in addition to these two possibilities other 
modifications can be used for effecting the binding of the specific compound and the 

30 ligand, their subsequent isolation and/or the release of the specific compound(s). 
In particular the invention relates lo chimeric proteins that are bound to the cell 
wall of lower eukaryotes. Suitable lower eukaryotes comprise yeasts, e.g. Candida^ 
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Debaryomyces^ Hansenukfy KJuyveromyces^ PicJi/a and Sacchaiomyces, and moulds e.g. 
Aspergillus, PenicilUum and Rliizopus, For some applications prokaryotes are also 
applicable, especially Gram -positive bacteria, examples of which include lactic acid 
bacteria, and bacteria belonging to the genera Bacillus and Streptomyces, 

5 

For lower eukaryotes the present invention provides genes encoding chimeric 
proteins consisting of: 

a. a DNA sequence encoding a signal sequence functional in a lower eukaiyotic 
host, e.g. derived from a yeast protein including the a -mating factor,invertase, 

10 o-agglutinin, inulinase or derived from a mould protein e.g. xylanase; 

b, a structural gene encoding a C-ierminal part of a cell wall protein preceded by a 
structural gene encoding a protein, that is capable of binding to the specific 
compound or group of compounds of interestt examples of which include 

- an antibody, 

15 -a single chain antibody fragment (scFv; see Bird and Webb Walker (1991)» 

- a variable region of the heavy chain ( Vh) or a variable region of the light chain 
(V]^) of an antibody or that part of such variable region still containing one to 
three of the complementarity determining regions (CDRs), 

* 

- an agonist-recognizing part of a receptor protein or a part thereof still capable 
20 of binding the agonist, 

- a catalyticaDy inactivated enzyme, or a fragment of such enzyme still containing 
a substrate binding site of the enzyme, 

- specific lipid binding proteins or parts of these proteins still containing the lipid 
binding site(s)» see Ossendorp (195^), and 

25 - a peptide that has been obtained via Applied Molecular Evolution, see Lewin 
(1990). 

All expression products of these genes are characterized in that they consists of.a 
signal sequence and both a protein part, that is capable of binding to the 
compound(s) to be isolated, and a C-ierminus of a typically cell wall bound protein, 
30 examples of the latter including a-agglutinin, see Lipke c.s. (1989), a-agglutinin, see 
Roy c.s. (1991), FLOl (see Example 5 and SEQ ID NO: 14) and the Major Cell 
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Wall Protein of lower eukaryotes, which C-terminus is capable of anchoring the 
expression product in the cell wyll of the Jower eukaryote host organism. 
The expression of these genes encoding chimeric proteins can be under control of a 
constitutive promoter, but an inducible promoter is preferred, suitable examples of 
5 which incttide the GAL7 promoter from Sacchatomyces^ the inulinase promoter from 
Kluyveromyces^ the methanol -oxidase promoter from Hansenula^ and the xylanase 
promoter of Aspergillus, Preferably the constructs are made in such a way that the 
new genetic information is integrated in a stable way in the chromosome of the host 
cell see e.g. WO.91/00920 (UNILEVER). 
10 The lower eukaryotes transformed with the above mentioned genes can be grown in 
normal fermentation, continous fermentation, or fed batch fermentation processes. 
The selection of a suitable process for growing the microorganism will depend on 
the construction of the gene and the promoter used, and on the desired purity of the 
cells after the physical separation procedure(s). 

15 

For bacteria the present invention deals with genes encoding chimeric proteins 
consisting of: 

a. a DNA sequence encoding a signal sequence functional in the specific bacterium, 
e.g. derived from a Bacillus a -amylase, a Bacillus subtilis subtilisin, or a 

20 Lactococcus lactis subsp. cremons pxolQ,mBse\ 

b. a structural gene encoding a C-terminal part of a cell wall protein preceded fay a 
structural gene encoding a protein capable of binding to the specific compound or 
group of compounds of interest, examples of which are given above for a lower 
eukaryote. 

25 All expression products of these genes are characterized in that they consist of a 
signal sequence and both a protein part, that is capable of binding to the specific 
compound or specific group of compounds to be isolated, and a C-terminus of a 
typically ceil wall -bound protein such as the proteinase of Lactococcus lactis subsp. 
cremons strain Wg2, see Kok c,s. (1958) and Kok 0990), the C-terminus of which is 

30 capable of anchoring the expression product in the cell wall of the host bacterium. 
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The invention is illustrated with the following Examples without being limited 
thereto. First the endo nuclease restriction sites mentioned in the Examples are 
given. 



10 



BstEXl G GTNACC 
CCANTG G 



Clal 



Notl 



15 Sad 



G AATTC 
CTTAA G 

GC GGCCGC 
CGCCGG CG 

GAGCT C 
C TCGAG 



Wrul 



Sail 



AT CGAT 
TAGC TA 



Hindlll A AGCTT 

TTCGA A 



TCG CGA 
AGC GCT 

G TCGAC 
CAGCT G 



Eagl 



Nhel 



PstZ 



XhoX 



C GGCCG 
GCCGG C 

G CTAGC 
CGATC G 

CTGCA G 
G ACGTC 

C TCGAG 
GAGCT C 



Example 1. Construction of a gene encoding a chimeric protein that will be 

20 anchored in the cell wall of a lower eukaryote and is able to bind 

with high specificity lysozyme from a complex mixture. 
Lysozyme is an anti-microbiat enzyme with a number of applications in the 
phannaceutical and food industries. Several sources of lysozyme are known, e.g. egg 
yolk or a fermentation broth containing a microorganism producing lysozyme. 

25 Monoclonal antibodies have been raised against lysozyme, see Ward cs. (1989), and 
the mRNA's encoding the light and heavy chains of such antibodies have been 
isolated from the hybridoma cells and used as template for the synthesis of cDNA 
using reverse transcriptase. Starting from the plasmids as described by Ward cs. 
(1989), we constructed a pEMBL-derived plasmid» designated pUR4122, in which 

30 the multiple cloning site of the pEMBL-vector» ranging from the JEcoRI to the 

Hin6i\l site, was replaced by a 231 bp DNA fragment, whose nucleotide sequence is 
given in SEQ ID NO: 1 and has an £coRI site (GAATTC) at nucleotides 1-6, a Fstl 
site (CTGCAG) at nucleotides 105-110, a BsiEW site (GGTCACC) at nucleotides 
122-128, a JWiol site (CTCGAG) at nucleotides lOl-lll, and a Hinmi site 

35 (AAGCTT) at nucleotides 226-231. 
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Construction of pUR4122 

Plasmid pEMBL9. see Dente c.s. (1983), was digested with EcoRl and HindiW and 
the resulting large fragment was ligated with the double stranded synthetic DNA 
fragment given in SEO ID NO: 1. For the successive ligation of DNA fragments, 
5 which finally form the coding setiuence of a single chain antibody fragment for 

lysozyme, the following elements were combined in the 231 bp DNA fragment (SEQ 
ID NO: 1) inserted into the pEMBL-9 vector: the 3' part of the GAL7 promoter, 
the invertase signal sequence (SUC2), a PstX restriction site, a BstBW restriction site, 
a sequence encoding the (GGGGS)x3 peptide linker connecting the VHand Vl frag- 

10 ments, a Sac\ restriction site» a Xfio\ restriction site and a HindlU restriction site, 
resulting in plasmid pLIR4]19. To obtain the in frame fusion between and the 
GGGGS-Iinker plasmid pSWl-VHDl„VVKD1.3-TAGl, see Ward cs. (1989), was 
digested with Pstl and BstEU and a DNA fragment of 0.35 kbp was ligated in the 
correspondingly digested pUR4119 resulting in plasmid pUR4119A. Subsequently 

15 the plasmid pSWl-VHD1.3-VKDl,3-TAGl was digested with Sad and Xliol and 
this fragment containing the coding part of Vl was finally ligated into the Sacl/Xhol 
sites of pUR4H9A, resulting in plasmid pUR4122 (see Figure 1). 

Ctonstnictibn of pUR4174. see Figure 4 

20 To obtain 5. ccrevisiae episomal expression plasmids containing DNA encoding a cell 
wall anchor derived from the C-terminal part of a-agglutinin, plasmid pUR2741 (see 
Figure 2) was selected as starting vector. Basically, this plasmid is a derivative of 
pUR2740, which is a derivative of plasmid pUR2730 as described in WO-91/19782 
(UNILEVER) and by Verbakel (1991). The preparation of pUR2730 is clearly 

25 described in Example 9 of EP-A1-0255IS3 (UNILEVER). Plasmid pUR2741 differs 
from plasmid pUR2740 in that the Ect^'I restriction site within the remaining part of 
the already inactive tet resistance gene was deleted through Nrul/Sall digestion. Tho. 
Sail site was filled in prior to religation. 

30 After digesting pUri4122 with Sac\ (partially) and HindlU, the approximately 800 bp 
fragment was isolated and cloned into the pL)R2741 vector fragment, which was 
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obtained after digestion of pUR2741 with the same enzymes. The resulting plasmid 
was named pUR4125. 

A plasmid named pUR2968 (see Figure 3) was made by (1) digesting with HindUl 
the >4gflf/ -containing plasmid pLa21 published by Lipke c.s. (1989), (2) isolating an 
5 about 6.1 kbp fragment and (3) ligating that fragment with //mdlTI-treated pEMBL9, 
so that the 6.1 kbp fragmenr was introduced into the ////idlll site present in the 
multiple cloning site of the pEMBL9 vector. 

Plasmid pUR4125 was digested with Xhol and HMUl and the about 8 kbp 
fragment was ligated with the approximately 1 .4 kbp Nhel-HMlll fragment of 
10 pUR2968, using Xhol/Nhcl adapters having the following sequence: 
Xhol Whel 

5 ' - TC GAG ATC AAA GGC GGA TCT Q -3 < = SEQ ID NO: 2 

3»- C TAG TTT CCG CCT AGA CGATC -5 ' = SEQ ID NO: 3. 

The pJasmid resulting from the ligation of the appropriate parts of plasmids 
15 pUR2968, pUR4125 and Xhdl/Nhel adapters, was designated pUR4174 and encodes 
a chimeric fusion protein at the amino terminus consisting of the invertase signal 
(pre) peptide, followed by the scFv-LYS polypeptide and, finally, the C-tenninal part 
of a -agglutinin (see Figure 4). 

20 Construction of pUR4175. see Figure 4 

Upon digesting pUR4122 (see above) with Psii and //mdlll, the approximately 
700 bp fragment was isolated and ligated into a vector fragment of plasmid pSY16, 
see Harmsen c.s. (1993), which was digested with Eagl and Hi/zdlll and using 
Eagl-Pstl adapters, having the following sequence: 

25 EagT Pstl 

5 GCC G CC CAG GTG CAG CTG CA -3 ' = SEQ ID NO: 4 

3'- CGG GTC CAC GTC G -5* = SEQ ID NO: 5 

The resulting plasmid, named pUR4132, was digested with Xhol and NindUl and 
ligated with the approximately 1.4 kbp NheX-HindlW fragment of pUR2968 (see 
30 above), using JV/iol/N/?el adapters as described above, resulting in pUR4175 (see 
Figure 4). This plasmid contains a gene encoding a chimeric protein consisting of 
the a -ma ting factor prepro-peptide, followed by the scFv-LYS polypeptide and, 
finally, the C-terminal part of a-aggluiinin. 
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Example 2. Consiruction of genes encoding a series of homologous chimeric 

proteins that will be anchored in the cell wall of a lower eukai^ote 
and are able to bind with high specificities the musk fragrance 
trasoolide^^ from a comptex mixture. 
5 The isolation of RNA from the liybridoma cell lines, the preparation of cDNA and 

amplification of gene fragments encoding the variable regions of antibodies by PGR 

was performed according to standard procedures known from the literature, see e.g. 

Orlandi c.s. (1989). For the PGR amplification different oligonucleotide primers 

have been used. 

10 For the heavy chain fragment: 

A: AGG TSM AR C TGC AG S AGT CWG G = SEQ ID NO: 6 

Pstl 

in which S is C or G, M is A or C, R is A or G, and W is A or T, 
and 

15 B: TG A GG A GAG GGT GAG C GT GGT CCC TTG GCC CC 

BstEU = SEQ ID NO: 7. 

For the light chain fragment (Kappa): 

C: GAG ATT GAG CTC ACC CAG TCT CCA = SEQ ID NO: 8. 



Sacl 



20 and 



D: GTT TGA TCT CGA G CT TGG TCC C = SEQ ID NO: 9. 

XhoJ 



Construction of pUR4143 

25 To simplify future construction work an £agl restriction site was introduced in 

pUR4122 (see above), at the junction between the invertase signal sequence and the 
scFv-LYS. This was achieved by replacing the about 110 bp EcoRl-Fstl fragment 
within the synthetic fragment given in SEQ ID NO: 1 by synthetic adapters with the 
following sequence: 

30 EcoRT Pst\ 

AATTCGGCCGTTCAGGTGCAG CTGCA = SEQ ID NO: 10 

GCCGGCAAGTCCACGTCG = SEQ ID NO: 11. 
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The resulting plasmid was designated pUR4 122.1: a construction vector for single 
chain Fv assembly in fnime behind an Eay} site for expression behind either the 
prepro-a-mating factor sequence or the SUC2 invertase signal sequence. 
After digesting the heavy chain PCR fragment with Pst\ and Bj/EII, two fragments 
5 were obtained: a Pst\ fragment of about 230 bp and a Pst\/BstlcM fragment of about 
110 bp. The latter fragment was cloned into vector ptJR4122.1, which was digested 
with Pstl and BstEW. The newly obtained plasmid (pUR4 122.2) was digested with 
Sad and Xhoh after which the light chain PCR fragment (digested with the same 
restriction enzymes) was cloned into the vector, resulting in pUR4122.3. This 

10 plasmid was digested with PstX^ after which the above described about 230 bp Fstl 
fragment was cloned into the plasmid vector, resulting in a plasmid called pUR4143. 
Two orientations are possible, but selection can be made by restriction analysis, as 
usual. Instead of the scFv-LYS gene originally presem in pUR4122, this new plasmid 
pUR4143 contains a gene encoding an scFv-TRAS fragment of anti-traseoUde 

15 antibody 02/01/02 (for the nucleotide sequence of the 714 bp Pstl-Xhol fragment 
see SEQ ID NO: 12). 

Construction of pUR4178 and pUR4179. 

After digesting pUR4143 with Ea^X and with //indlll. an about 715 bp fragmem can 
20 be isolated. Subsequentely, this fragment can be cloned into the vector backbone 
fragments of pUR2741 and pUR4175, that were digested with the same restriction 
enzymes. In the case of pUR274], this resulted in plasmid pUR2743.4 (see Figure 
5). This plasmid can subsequently be cleaved with Xiiol and /Trndlll and ligated with 
the about 8 kbp X/ioI-i/mdlll fragment of pUR4174, resulting in pUR4178 (see 
25 Figure 6). 

In the situation where pUR4175 was used as a starting vector, the resulting plasmid 
was designated pUR4179 (see Figure 7). 

Both plasmids, pUR4178 and pUR4179 were introduced into 5. cerevisiae, 

30 



4 
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Example 3. The modiftcation of the binding parts of the chimeric protein Chat 

can bind trascolide® in order to improve the binding or release or 
traseoltde® under certain conditions. 

Modification of binding properties of antibodies during the immune response is a 
5 well known immunological phenomenon originating from the fine tuning of 

complementarity determining sequences in the antibody's binding region to the 

antigen's molecular properties. This phenomenon can be mimicked in vitro by 

adjusting the antigen binding regions of antibody fragments based on molecular 

models of these regions in contact with the antigen. 
10 One such example consists of protein engineering the antimusk antibody M02/01/01 

to a stronger binding variant M020501i. 

Firsts a molecular model of M02/01 /Ol variable fragment (Fv) was constructed by 
homology modelling, using the coordinates of the anti-lyso2yme antibody HYHEL- 
10 as a template (Brookhaven Protein Data Bank entry: 3HFM). This model was 

15 refined using Molecular Mechanics and Molecular Dynamics methods from within 
the Biosym program DISCOVER, on a Silicon Graphics 4D240 workstation. 
Secondly, the binding site of the resulting Fv was mapped by visually docking the 
musk antigen into the CDR region, followed by a refinement using molecular 
dynamics again. Upon inspection of the resulting model for packing efficiency (van 

20 der Waals contact areas), it was concluded that substitution of ALA H96 by VAL 
would increase the (hydrophobic) contact area between the ligand and Fv, and 
consequently lead to a stronger interaction (see Figure 8). 
When this mutation is introduced into M02/01/01, the cDNA-derived scFv from 
Example 2, the result will be Fv M020501t; a variant with an increased affinity of at 

25 least a factor of 5 can be expected, and the increased affinity could be measured 
using fluorescence titration of the Fv with the musk odour molecule. 



30 




wo 94/18330 



17 



PCT/EP94/00427 



Exnmple 4. Con si ruction of a gene encoding a chimeric protein that will be 

anchored in the cell wall of lower eukaryote and is able to bind 
hormones such as HCC, 

Gene fragments, encoding the variable regions of the heavy and light chain 
5 fragments from the monoclonal antibody directed against the human chorionic 

gonadotropin were obtained from a hybridoma cell line in a similar way as described 

in Example 2. 

Snbsequently, these HCG V^, and V,, gene fragments were cloned into plasniid 
pUR4143 by replacing the corresponding -Ryrl-BjfEII and Sacl-Xhol gene fragments, 

10 resulting in plasmid pUR4M6. 

Similar to the method described in Example 2, the 734 bp Eag\-Xho\ fragment 
(nucleotide sequence given in SEQ ID NO: 13) encoding the variable regions of the 
heavy and light chain fragments from the monoclonal antibody directed against the 
human chorionic gonadotropin (an scFv-HCG fragment) was isolated from pUR4146 

15 and was introduced into the vector backbone fragment of pUR4178 (see Example 2) 
and will be introduced into the vector backbone fragment of pUR4175 (see Example 
1), both digested with the same restriction enzymes. The resulting plasmids 
pUR4177 (see Figure 9) was, and pUR4180 (see Figure 10) will be, introduced into 
S. cerevisiae strain SUIO. 

20 



Example 5. Construction of a gene encoding a chimeric scFv-F£X>l protein that 

will be anchored in the cell wall of lower eukaryote and is able to 
bind hormones such as HCG. 

25 One of the genes associated with the flocculation phenotype in S. cerevisiae is the 
FLOl gene. The DNA sequence of a clone containing major parts of the FLOl gene 
has been determined, see SEQ ID NO: 14 giving 2685 bp of the FLOl gene. The 
cloned fragment appeared to be approximately 2 kb shorter than the genomic copy 
as judged from Southern and Northern hybridizations, but encloses both ends of the 

30 FLOl gene. Analysis of the DNA sequence data indicates that the putative protein 
contains at the N -terminus a hydrophobic region which confirms a signal sequence 
for secretion, a hydrophobic C-terminus that might function as a signal for the 
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attachment of a GPI-anchor and many glycosylation sues, especially in the 
C-terminiis» with 46.6% serine and threonine in the arbitrarily defined C-terminus 
(aa 271-894). Hence, it is likely that the FLOl gene product is located in an 
orientated fashion in the yeast cell wall and may be directly involved in the process 
5 of interaction with neighbouring cells. 

The cloned FLOl sequence might therefore be suitable for the immobilization of 
proteins or peptides on the cell surface by a different type of cell wall anchor. 
For the production of a chimeric protein comprising the scFv-HCG followed by the 
C-terminal pan of the FLOl-pmtein, plasmid pUR2990 (see Figure 11) can be used 

10 as a starting vector. The preparation of episomal plasmid pUR2990 was described in 
our co-pending patent application WO-94/0J 567 (UNILEVER) published on 20 
January 1994» i.e. during the priority year. Plasmid pUR2990 comprises the chimeric 
gene consisting of the gene encoding the Humicola lipase and a gene encoding the 
putative C-terminal cell wall anchor domain of the FLOl gene product, the chimeric 

15 gene being preceded by the invertase signal sequence (SUC2) and the GAL7 
promoter; further the plasmid comprises the yeast 2 jim sequence, the defective 
Leu2 promoter described by E^kard and Hollenberg (1983), and the Leu2 gene, see 
Roy C.S. (1991). Plasmid pUR4146, described in Example 4, can be digested with 
Psii and Xhoh and the about 0.7 kbp Pstl-Xliol fragment containing the scFv-HCG 

20 coding sequence can be isolated. For the in frame fusion of this DNA sequence 

between the C-terminal FLOl part and the SUC2 signal sequence, the fragment can 
be directly ligated with the 9,3 kbp Eagi/Nhel (partial) backbone of plasmid 
pUR2990, resulting in plasmid pUR4196 (see Figure 12), This plasmid will comprise 
an additional triplet encoding Ala at the transition between the SUC2 signal 

25 sequence and the start of the scFv-HCG, and a E-I-K-G-G amino acid sequence in 
front of the first amino acid (Ser) of the C part of FLOl protein. 

If in the previous Examples 1-5 the level of exposed antibody fragments is too low, 
the production level can be increased by mutagenesis of the frame work regions of 
30 the antibody fragment. This can be done in a site directed way or by (targeted) 
random mutagenesis, using techniques described in the literature. 
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Example 6. 



15 



20 



25 



30 



35 



Const ruction of a gene encoding a chimeric protein that will be 
anchored in the cell wall of a lower eukaryote and is able to bind 
cholesterol. 



In the literature two DNA sequences for cholesterol oxidase are described, the choh 
5 gene from Brevibactenum sicro/icunt, see Ohta c.s. (1991) and the choA gene from 
Streptomyces sp^ SA-COO, see Ishizaka c.s. (1989). For the construction of a DNA 
fusion between the c/wB gene coding for cholesterol oxidase (EC 1.13.6) and the 
3' pan of the AG-a 1 gene, the PCR technique on chromosomal DNA can be 
applied. Chromosomal DNA can be isolated by standard techniques from 
10 Brevibactenum sterolia wj, and the DNA part coding for the mature part of the 

- cholesterol oxidase can be amplified through application with the following 
corresponding PCR primers choOi per and choOlpcr: 



choOlpcr 

5*- 
3'- 



GCC CCC AGC CGC ACC CTC G-3 ' 

CGG GGG TCG GCG TGG GAG C-5 ' 

III III III III III til I 

III III til 111 III III I 

GCC CCC AOC CCC ACC CTC G-3' 



5 * -AGATC TGAATTCGCGGCC 
BeoRl Wot I 

cho02pcr 

Nhel Hindlll 

3 '-TAG TAG AGC AGG CTG TAG GTC CGATCGA CT TTCGAA TCTAGA-5 ' 

III III III III III III III 

lit (11 ill III III r I I III 

5'-ATC ATC TCG TCC GAC ATC CAG-3 ' 

3 '-TAG TAG AGC AGG CTG TAG GTC-5 * 



= SEQ ID NO: 16 
= SEQ ID NO: 17 

= SEQ ID NO: 18 



= SEQ ID NO: 19 

= SEQ ID NO: 20 
= SEQ ID NO: 21 



Both primers can specifically hybridize with the target sequence, thereby amplifying 
the coding part of the gene in such a way» that the specific PCR product -after 
Proteinase K treatment and digestion with EcoRl and Hindlll- can be directly 
cloned into a suitable vector, here preferably pTZl9R, see Mead c.s. (1986), This 
will result in plasmid pUR2985 (see Figure 13). 

In addition to the already mentioned restriction sites both PCR primers generate 
other restriction sites at the 5' end and the 3* end of the 1.5 kbp DNA fragment, 
which can be used later on to fuse the fragment in frame between either the SUC2 
signal sequence or the prepro-a-mating factor signal sequence on one side and the 
C-terminus coding part of the a-agglutinin gene on the other side. To facilitate the 
ligation behind the prepro-MF sequence a Not\ site is introduced at the 5' end of 
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PGR oligonucleottcle choOlpcr, iil lowing for example, the exchange of the 732 bp 
Eagl/Nfw\ fragment containing the scFv-Lys coding sequence in pUR4175 for the 
choB coding sequence. 

To create an enzymatically inactive fusion protein between cholesterol oxidase and 
5 a-agglutinin, the above described subcloning into pTZ]9R can be used. Cholesterol 
oxidase is an FAD-dcpciidcnt cnzs'me for which the crystal structure of the 
Brevibacterutn sterolicum enzyme has been determined, see Vrielink cs. (1991). The 
enzyme displays homology with the typical pattern of the FAD-binding domain with 
the Gly-X-Gly-X-X-Gly sequence near the N-terminus (amino acid 18-23). Sile- 

10 directed ;/t vitro mutagenesis on the plasmid pUR2985 according to the 

manufacturer's protocol (Muta-Gene kit, Bio-Rad) can be applied to inactivate the 
FAD-binding site through replacing the triplet(s) encoding the Gly residue(s) by 
triplets encoding other amino acids, thereby presumably inactivating the enzyme. 
£.g. the following primer can be used for site-directed mutagenesis of 2 of the 

15 conserved Gly residues. 



pr 3'- CGG GAG CAG TAG CGG TCA CGT ATG CCG CCA CGG GAG CGG CGC -5' 

(If ill tri III I f III i I III III ill til III III III 
III III III I I I I I III I t III III III III III lit III 

CB 5'- GCC CTC GTC ATC GCC AGT GGA TAG GGC GGT CCG GTC GCC GCG -3 ' 
20 Ala Gly Gly Gly Gly Ala Ala Ala 

4 4, 
Ala Ala 

pr = primer » SEQ ID NO; 22 

cs = coding strand = SEQ ID NO: 23 

25 

As a result of the mutagenesis with the described primer, plasmid pUR2986 will be 
obtained. From this plasmid the DNA coding for the presumably inactivated 
cholesterol oxidase can be released as a 1527 bp fragment through Not\/Nhe\ 
digestion, and subsequently directly used to exchange the scFv-Lys coding sequence 
30 in pUR4175, thereby generating plasmid pUR2987 (see Figure 14). To obtain a 
variant yeast secretion vector, where the secretion is directed through the SUC2 
signal sequence, for example the 3823 bp long Sacl/N/tel segment of plasmid 
pUR2986 can be used to replace the SaclfNhel fragment in pUR4174. 
j This inactivation of the FAD-binding site might be preferable over other mutations, 

35 since an unchanged active centre can be expected to leave the binding properties of 
cholesterol oxidase for cholesterol unaltered. Instead of the described Gly-Ala 
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exchanges at position 18 and 20 of the mature coding sequence, every other suitable 
amino acid change can also be performed. 

To inactivate the enzyme, site directed mutagenesis can be optionally immediately 
performed in the active site cavity, for example through exchange of the Glu33l, a 
5 residue appropriately positioned to act as the proton acceptor, thus generating a new 
variant of an immobih'zed, enzymatically inactive fusion protein. 



Example 7. Construction of a gene encoding a chimeric protein that will be 
10 anchored in the cell wall of a lactic acid bacterium and is able to 

bind cholesterol. 

It has been described that proteinase of Lactococcus lactis subsp. cremoris is 
anchored to the cell wall through its 127 amino acid long C-terminal, see Kok cs. 
(1988) and Kok (1990). In a way similar to that described in Example 6, the 

15 cholesterol oxidase of Brevibacteriuin steroUcwn (choB) can be immobilized on the 
surface of Lactococcus lactis. Fusions can be made can be made between the choB 
structural gene and the N-terminal signal sequence and the C-terminal anchor of the 
proteinase of Lactococcus lactis. Plasmid pGKVSSO (see Figure 15) contains the 
complete proteinase operon of Lactococcus lactis subsp. cremoris Wg2, including the 

20 promoter, a ribosome binding site and DNA fragments encoding the already 
mentioned signal and anchor sequences, see Kok (1990). First a DNA fragment, 
containing the main part of the signal sequence, flanked by a Clal site and an Eagl 
site can be constructed with PCR on pGKV550 as follows: 



25 Primer prtl: 

5 ' -AA GAT C TA TCG AT C TTG TTA GCC GGT ACA-3 ' = SEQ ID NO: 24 

Proteinase gene (non coding strand): 

3'-TT CCC G AT AGC TA G AAC AAT CGG CCA TGT CAG-5 ' 

Clal = SEQ ID NO: 25 

30 

Proteinase gene: Gin Ala Lys 

5 ' -GTC GGC GAA ATC CAA GCA AAG GCG GCT*3 ' = SEQ ID NO: 26 

Primer prt2: = SEQ ID NO: 27 

3 * -CAG CCG CTT TAG GTT CGT T GC CGG C CC CCC TTC GAA CCC-5 ' 
35 Eagl HlndlXZ 
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After the PCR reaction as described in Example 6, the 98 bp long PGR fragment 
can be isolated and digested with Cla\ and HincIlW. pGKV550 can subsequently be 
cleaved partially with Cla\ and completely with JiinclUl, after which digestions the 
vector fragment, containing the promoter, the ribosome binding site, the DNA 
5 fragment encoding the N-terminal 8 amino acids and the cell wall binding fragment 
containing the 127 C-terminal amino acids of the proteinase gene can be isolated on 

gel. 

A copy of the cholesterol oxidase gene» suitable for fusion with the prtP anchor 
domain can be produced by a PCR reaction using plasmid pUR2985 (Example 6) as 
10 template and a combination of primer choOlpcr (see Example 6) and the following 
primer cho03pcr instead of primGr cho02pcr: 



15 



cho03pcr HindlU 

3 '-TAG TAG AGO AGG CTG TAG GTC CGA G TT CGA A CC TAG GC-5 * = SEQ ID NO: 40 

111 III III III III I t t til 
I ( I I I I I I t I I I I I r f I I I I f 

5 ' -ATC ATC TCG TCC GAC ATC CAG = SEQ ID NO: 20. 



The about 1.53 kbp fragment generated by this reaction can be digested with Not! 
and Hin6Jl\ to produce a molecule which can subsequently be Ugated with the large 
Eagl/JiinAlTi fragment from pUR2988 (see Figure 16). The resulting plasmid, 

20 pUR2989, will contain the cholesterol oxidase coding sequence inserted between the 
signal sequence and the Oterminal cell wall anchor domain of the proteinase gene. 
After introduction into Lactobacillus lactis subsp. lactis MG1363 by electroporation, 
this plasmid will express cholesterol oxidase under control of the proteinase 
promoter. The transport through the membrane will be mediated by the proteinase 

25 signal sequence and the immobilization of the cholesterol oxidase by the proteinase 
anchor. As it is unlikely that the Lactococcus will secrete FAD as well, the 
cholesterol oxidase will not be active but will be capable to bind cholesterol. 



30 Example 8. Construction of a gene encoding a chimeric protein that will be 

anchored in the cell wall of a lower eukaryote and is able to bind 
growl h hormones, such as the epidermal growth factor. 

For the isolation of larger amounts of human epidermal growth factor (EGF) the 
corresponding receptor can be used in form of a fusion between the binding domain 



wo 94/18330 



23 



PCT/EP94/00427 



and a C-terminal part of a-aggliitiniii as cell wall anchor. The complete cDNA 
sequence of the human epidermal growth factor is cloned and sequenced. For the 
, construction of a fusion protein M'ith EGF binding capacity the N-terminal part of 
the mature receptor until the central 23 amino acids transmenbrane region can be 
5 utilized. 

The plasmid pUR4l75 can be u.sed for the con.struction. Through digestion with 
Eagl and Nhel (partial) a 731 bp DNA fragment containing the sequence coding for 
scFv is released and can be replaced by a DNA fragment coding for the first 621 
amino acids of human epidermal growth factor receptor. Initiating from an existing 

10 human cDNA library or otherwise through production of a cDNA library by 

standard techniques from preferentially EGF receptor overexprcssing cells, e.g. A431 
carcinoma ceUs^ see UHrich cs. (1984), further PCR can be applied for the 
generation of in frame linkage between the extracellular binding domain of the 
human growth factor receptor (amino acid 1-622) and the C-terminal part of 

.15 a-agglutinin. 

PCR oligonucleotides for the in frame linkage of human epidermal growth factor 
receptor and the C-terminus of a-agglutinin. 

20 a: PCR oh'gonucleotides for the transition between SUC2 signal sequence and the 

N-terminus of mature EGF receptor. 

> mature EGF receptor 
pri EGFl: Ala Leu Glu Lys Lys Val = SEQ ID NO: 28 

5'-GGG GCG GCC GC G CTG GAG GAA AAG AAA GTT TGC-3 * 
Wo^i > < t III I t I lit 111 II) III 

3'-CGC TCA GCC CGA GAC CTC CTT TTC TTT CAA ACG 5» 

EGF rec (non-coding strand): = SEQ ID NO: 29 

b: PCR oligonucleotides for the in frame transition between C terminus of the 
30 extracellular binding domain of EGF receptor and the C terminal part of 
a-agglutinin. 
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EGF rec (coding strand): 

Asn Gly Pro He Pro Ser Ala Thr 

5 • -AAT GGG CCT AAG ATC CCG TCC ATC GCC ACT- 3 ' = SEQ ID NO: 30 

I I I 111 I I I III III III III li-' lyKy, Ji. 

5 3'-TTA CCC GGA TTC TAG GGC AGG CGA TCG GA ATTCGAA CCCC-5 ' 
pr EGF2: Whel Hindlll 

This fusion would result in an addition of 2 Ala amino acids between the signal 

sequence and the mature N-terminus of EGF receptor. 

The newly obtained 1.9 kbp PGR fragment can be digested with NotX and Nhe\ and 
10 directly ligated into the vector pUR4175 after digesting with the same enzymes, 
resulting in plasmid pUR2993 (see Figure 17), comprising the GAL7 promoter, the 
prepro-a-mating factor sequence, the chimeric EGF receptor binding domain gene 
/ a -agglutinin gene, the yeast 2 M-m sequence, the defective LEU2 promoter and the 
LEU2 gene. This plasmid can be transformed into S. cerevisiae and the transformed 
15 cells can be cultivated in YP medium whereby expression of the chimeric protein 
can be induced by adding galactose to the medium. 



Example 9. Construction of genes encoding a chimeric protein anchored to the 
20 cell wall of yeast, comprising a binding domain of a "Camelidae" 

heavy chain antibody 
Recently it was described that camels as well as a number of related species (e.g. 
lamas) contain a considerable amoimt of IgG antibody molecules which are only 
composed of heaxy-chain dimers, see Hnmers-Casterman c.s. (1993). Although these 
25 "heavy-chain" antibodies are devoid of light chains, it was demonstrated, that they 
nevertheless have an extensive antigen-binding repertoire. In order to show that the 
variable regions of this type of antibodies can be produced and will be linked to the 
exterior of the cell wall of a yeast, the following constructs were prepared. 

30 Construction of pUR2997. pUR2998 and pUR2999 

The about 2.1 kbp Eag\-HM\\\ fragment of pUR4l77 (Example 4, Fig 9) was 
isolated. By using PGR technology, an £'coRI restriction site was introduced 
inunediately upstream of the Eagl site, whereby the C of the £coRI site is the same 
as the first C of the Eoi^X site. The thus obtained EcoKA-HindWl fragment was 
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ligaied into pi asm id pEMBL9, which was digested with EcoK) and //mdlll, which 
resulted in pUR4l77.A 

The EcoRl/Nhel fragment of plasmid pUR4177.A was replaced by the EcoRl/Nhel 
fragments of three different synthetic DNA fragments (SEO ID NO: 32, SEQ ID 
5 NO: 33, and SEQ ID NO: 34) resulting in pUR2997, pUR2998 and pUR2999, 

respectively. The about 1.5 kbp BstEll-Hindlll fragments of pUR2997 and pUR2998 
were isolated. 



Construction of pljR4421 

10 The multiple cloning site of plasmid pEMBL9, see Dente c.s. (1983), (ranging from 
the £coRl to the HindWl site) was replaced by a synthetic DNA fragment having the 
nucleotide sequence given below, see SEQ ID NO: 35 giving the coding strand and 
SEQ ID NO: 36 giving the non-coding strand. The 5'-part of this nucleotide 
sequence comprises an Eagl site, the first 4 codons of a Ccunelidae gene 

15 fragment (nucleotides 16-27) and a AVioJ site (CTCGAG) coinciding with codons 5 
and 6 (nucleotides 28-33). The 3'-part comprises the last 5 codotis of the Camelidae 
Vii gene (nucleotides 46*60) (part of which coincides with a Bst'EH site), eleven 
codons of the Myc tail (nucleotides 61-93), see SEQ ID NO: 35 containing these 
eleven codons and SEQ ID NO: 37 giving the amino acid sequence, and an £coRI 

20 site (GAATTC). The BcoRl site, originally present in pEMBL9, is not functional 
any more, because the 5'- end of the nucleotide sequence contains AATTT itistead 
of AATTC, indicated below as {EcoRl). The resulting plasmid is called pUR4421. 
The Camelidae fragment starts with amino acids Q-V-K and ends with amino 
acids V-S-S. 

25 {EcoVa) Bag! XhoX BstBZX 

5 ■ - AATT TA GCGG CCG CCCAGGT GAAACT GCTC GAGTAAGTGA CTA AGGTCAC - 50 
3' 1 ATCGCC GGCGGGTCCA CTTTGACGAG CTCATTCACT GATTCCAGTG- 
5 Q V K 

30 -£GTCTCCTCA GAACAAAAAC TCATCTCAGA AGAGGATCTG AATTAATGAG- 100 
-GCAGAGGAGT CTTGTTTTTG AGTAGAGTCT TCTCCTAGAC TTAATTACTC- 
VSS EQK LISE EDL N** 

= SEQ ID NO: 37 

Econi Hindlll 

35 - AATTC ATCAA ACGGTGATA -3' 119 = SEQ ID NO: 35 

-TTAAGTAGTT TGCCACTATT CGA -5' 123 = SEQ ID NO: 36 
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Construction of pUR4424 

After digesting the plasmid pB09 with Xhol and &fEll, a DNA fragment of about 
034 kbp was isolated from agarose gel. This fragment codes for a truncated Vh 
fragment, missing both the first 4 and the last 5 amino acids of the Camelidae 
5 fragment. Plasmid pB09 was deposited as E. colt JM109 pB09 at the Centraal 
Bureau voor Schimmelcultures, Baarn on 20 April 1993 with deposition number 
CBS 271.93. The DNA and amino acid sequences of the Camel fragments 
followed by the Flag sequence as present in plasmid pB09 were given in Figure 6B 
of European patent application 93201239.6 (not yet published), which is herein 
10 incorporated by reference. The obtained about 034 kbp fragment was cloned into 
pUR4421. To this end plasmid pUR4421 was digested with Xfiol and HmdUl, aitex 
which the about 4 kb vector fragment was isolated from an agarose gel. The 
resulting vector was ligated with the about 0.34 kbp Xhol/Bst^ll fragment and a 
synthetic DNA linker having the following sequence: 

15 BstEII Hindlll 

GTCACC GTCTCCTCATAATGA = SEQ ID NO: 38 

GCAGAGGAGTATTACTTCGA = SEQ ID NO: 39 

resulting in plasmid pUR4421-09. 
20 Plasmid pSY16 was digested with Eag\ and i/mdm, after which the about 6.5 kbp 
long veaor backbone was isolated and ligated with the about 038 kbp Eagl/Hindin. 
fragment from pUR4421-09 resulting in pUR4424. 

Construction of pUR4482 and pUR4483 

25 From pUR4424 the about 0.44 kbp Sacl-BstBU fragment, coding for the invertase 
signal sequence and the camel heavy chain variable 09 ( = CHv09) fragment, was 
isolated as well as the about 6.3 kbp 5flcI-i/mdIII vector fragment The about 63 
kbp fragment and the about 0.44 kbp fragment from pUR4424 were ligated with the 
BstEll-Hindm fragment from pUR2997 or pUR2998 yielding pUR4482 and 

30 pUR4483, respectively. 

Plasmid pUR4482 is thus an yeast episomal expression plasmid for expression of a 
fusion protein with the inverteise signal sequence, the CHv09 variable region, the 
Myc-tail and the Camel "X-P-X-P" Hinge region, see Hamers-Casterman c.s. (1993), 
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(1993), and the a-aggliitiniii ceil wall anchor region. Plasmid pUR4483 differs from 
pUR4482 in that it contiiins the Myc-iail but not the "X-P-X-P" Hinge region. 
Similarly, the BstElUHinciUl fragment from pUR2999 can be ligated with the about 
6.3 kbp vector fragment and the about 0.44 kbp fragment from pUR4424, resulting 
5 in pUR4497, which will differ from pUR4482 in that it contains the "X-P-X-F' Hinge 
region but not the Myc-tail. 

The pla.smid.s pUR4424, pUR4482 and pUR4483 were introduced into 
Saccharomyces cerevixiae SUJO by elect roporation, and transformants were selected 
on plates lacking leucine. Transformants from SUlO with pUR4424» pUR4482 or 

10 pUR4483, respectively, were grown on YP with 5% galactose and analysed with 
immu no-fluorescence microscopy, as described in Example 1 of our co-pending 
WO-94/01567 (UNILEVER) published on 20 January 1994. This method was slightly 
modified to detect the chimeric protein.s, containing both the camel antibody and 
-the Myc tail, present at the cell surface. 
- 15 In one method a monoclonal mouse anti-Myc antibody was used as a first antibody 
to bind to the Myc part of the chimeric protein; subsequently a polyclonal anti- 
mouse Ig antiserum labeled with fluorescein isothiocyanate FTTC) ex Sigma, 
Produa No. F-0527, was used to detect the bound mouse antibody and a positive 
signal was determined by fluorescence microscopy. 

20 In the other method a polyclonal rabbit anti-human IgG serum, which had earlier 
been proven to cross-react with the camel antibodies, was used as a first antibody to 
bind the camel antibody part of the chimeric protein; subsequently a polyclonal anti- 
rabbit Ig antiserum labeled with FITC ex Sigma, Product No. F-0382, was used to 
detect the bound rabbit antibody and a positive signal was determined by 

25 fluorescence micro.scopy. 

The results in Figure 19 and Figure 20 show clearly that fluorescence can be obser- 
ved on those cells in which a fusion protein of the CHv09 fragment with the a- 
agglutinin cell wall anchor region is produced (pUR4482 and pUR4483). No 
30 fluorescence however, was visible on the cells which produce the CHv09 fragment 
without this anchor (pUR4424), when viewed under the same circumstances. 
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Information on a deposit of a micro-organism under the Budapest Treaty is given on 
page 26, lines 5-7 above . In agreement with Rule 28 (4) EPC, or a similar 
arrangement for a State not being a Contracting State of the EPC, it is hereby 
requested that a sample of such deposit, when requested, will be submitted to an 
30 expert only. 
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SEQUENCE LISTING 



(1) OBNERAIi INFORMATION: 
(i) APPLICANT; 



A 
B 
C 
E 
F 

A 
B 
C 
E 
F 

A 
B 

C 
E 
F 

A 
B 
C 
E 
F 

A 
B 
C 
E 
P 

A 
B 

C 
E 
F 

A 
B 
C 
E 
F 



NAME: Unilever N*V. 
STREET: Weena 455 
CITV: Rotterdam 
COUNTRY: The Netherlands 
POSTAL CODE (2IP>: NL-30a3 



AL 



NAME: Unilever PLC 

STREET; Unilever House Black£riars 

CITY: London 

COUNTRY: United Kingdom 

POSTAL CODE (ZlP)r EC4P 4SQ 

NAME: Leon Gerardus J. FRENKEN 
STREET: GeXdersestraat 90 
CITY: Rotterdam 
COUNTRY; The Netherlands 
POSTAL CODE (ZIP): NL-3011 MP 

NAME: Pieter DE GEUS 
STREET: Boeler 24 
CITY : Barendrecht 
COUNTRY: The Netherlands 
POSTAL CODE (ZIP): NL-2991 KB 

NAME: Franclscus Maria KLIS 
STREET: Benedenlangs 102 
CITY : Amsterdam 
COUNTRY: The Netherlands 
POSTAL CODE (ZIP): NL-1025 KL 



NAME: Holger York TOSCHKA; 

STREET: Aeclcern 1 

CITY: REKEN 

COUNTRY : Germany 

POSTAL CODE (ZIP): D-48734 



NAME: Cornelis Theodorus VERRIPS 
STREET: Hagedoorn IB 
CITY: Maaesluis 
COUNTRY: The Netherlands 
POSTAL CODE (ZIP); NL-3142 KB 



c/o Langnese iglo, BR3 



(ii) TITLE OP INVENTION: Immobilized proteins with specific binding 
capacities and their use in processes and products. 

(iii) NUMBER OF SEQUENCES: 40 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1-25 (EPO) 
(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 231 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
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(vii) IMMEDIATE SOURCE: 

(B) CLONE: fragment in pUR4119 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

GAATTCGAGC TCATCACACA AACAAACAAA ACAAAATGAT GCTTTTGCAA GCCTTTCTTT 60 

TCCTTTTGGC TGGTTTTGCA GCCAAAATAT CTGCGCAGGT GCAGCTGCAG TAATGAACCA 120 

CGGTCACCGT CTCCTCAGGT GGAGGCGGTT CAGGCGGAGG TGGCTCTGGC GGTGGCGGAT ISO 

CGGACATCGA GCTCACTCAG ACCAAGCTCG AGATCAAACG GTGATAAGCT T 231 

(2) INFORMATION FOR SEQ ID NO: 2t 

(i) SEQUENCE CHARACTERISTICS: 
(h) LENGTH: 21 base pairs 
(B> TYPE; nucleic acid 

(C) STRANDEDKESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(vii) IMMEDIATE SOURCE: 

(B) CLONE: linker xhol-Nhel coding strand 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
TCGAGATCAA AGGCGGATCT G 21 
(2) INFORMATION FOR SEQ ID NO: 3: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TVPE; nucleic acid 

(C) STRANDEDNESS 1 single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(vii) IMMEDIATE SOURCE: 

(B) CLONE: linker xhoI-Nhel non- coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
CTAGCAGATC CGCCTTTGAT C 21 
(2) INFORMATION FOR SEQ ID HOt 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: linker Eagl-PstX coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

GGCCGCCCAG GTGCAGCTGC A 21 
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<2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS.' 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 
{C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: linker Eagl-PstI non-coding strand 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5t 

GCTGCACCTC GGC 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA (genomic) 

(Vil) IMMEDIATE SOURCE: 

(B) CLONE: PCR primer A (heavy chain) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

AGGTSMAKCT GCAGSAGTCW GG 22 

(2) XNPORMATION FOR SEQ ID NO: 7 1 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: PCR primer B (heavy chain) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

TGAGGAGACG GTGACCGTGG TCCCTT6GCC CC 32 

(2) INPORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: PCR primer C (light chain) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

CACATTGAGC TCACCCAGTC TCCA 24 



1 
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(2) INFORMATIOM FOR SEQ ID MO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY; linear 

{ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: PCR primer D (light chain) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

GTTT6ATCTC GAGCTTGGTC CC 22 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA {genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE; linker EcoRI-PstI coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID BO: 10: 

AATTCGGCCG TTCAGGTGCA GCTGCA 26 

(2) IHFORHAXION FOR SEQ ID NO: XI: 

(i) SEQXffiNCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
{ D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: linker EcoRI-PstI non-coding strand 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

GCTGCACCTG AACGGCCG 18 

<2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 714 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA (genomic) 

■v 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: ScFv ant itraseolide 02/01/01 



4 
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(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 12: 



CTGCAGGAGT 


CTGGACCTGG 


CCTGGTGAAA 


CCTTCTCAGT 


CTCTGTCCCT 


CACCTGCACT 


60 


GTCACTGGCT 


ACTCAATCAC 


CAGTGATTTT 


GCCTGGAACT 


GGATCCGGCA 


GTTTCCAGGA 


X20 


AACCAACTGG 


AGTGGATGGG 


CTACATAAGC 


TACAGTGGTA 


GCACTAGCTA 


CAACCCATCT 


X80 


CTCAAAAGTC 


GAATCTCTCT 


CACTCGAGAC 


ACATCCAAGA 


ACCAGTTCTT 


CCTGCAGTTG 


240 


AATTCTGTGA 


CTACTGAGGA 


CACAGCCACA 


TATTACTGTG 


CAACGTCCCT 


AACATGGTTA 


300 


CTACGTC6GA 


AACGTTCTTA 


CTGGGGCCAA 


GGGACCACGG 


TCACCGTCTC 


CTCAGGXGGA 


360 


GGCGGTTCAG 


GCGGAGGTGG 


CTCTGGCGGT 


GGCGGATCGG 


ACATCGAGCT 


CACCCAGTCT 


420 


CCATCCTCCA 


TGTCTGTATC 


TCTGGGAGAC 


ACAGTCAGCA 


TCACTTGCCA 


TGCAAGTCAG 


480 


GACATTAGCA 


GTAATATAGG 


GTGGTTGCAG 


CAGAAACCAG 


GGAAATCATT 


TAA6GGCCTG 


540 


ATCTATCATG 


GAACCAACTT 


GGAAGATGGT 


ATTCCATCAA 


GGTTCAGTGG 


CAGTGGATCT 


600 


GGAGCAGATT 


ATTCCCTCAC 


CATCAGCAGC 


CTGGAATCTG 


AAGATTTTGC 


AGACTATTAC 


660 


TOrCTACAGT 


ATGCTCAGTT 


TCCATTCACG 


TTCGGCrCGG 


GGACCAAGCT 


OGAG 


714 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 734 base pairs 

(B) TYPE: nucleic acid 

(C) STRAN0E0NE5S : double 

(D) TOPOLOGV: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: ScPv anti-HCG 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO; 13: 



CGGCCGTTCA 


GGTGCAGCTG 


CAGGAGTCTG 


GGGGACACTT 


AGT6AAGCCT 


GGAGGGTCCC 


60 


TGAAACTCTC 


CTGTGCAGCC 


TCTCGATTCG 


CTTTCACTAG 


CTTTGACATG 


TCTTGGATTC 


120 


GCCAGACTCC 


GGAGAAGAGG 


CTGGAGTGGG 


TCGCAAGCAT, 


TACTAATGTT 


GGTACTTACA 


180 


CCTACTATCC 


AGGCAGTGTG 


AAGGGCCGAT 


TCTCCATCTC 


CAGAGACAAT 


GCCAGGAACA 


240 


CCCTAAACCT 


GCAAATGAGC 


AGTCTGAGGT 


CTGAGGACAC 


GGCCTTGTAT 


TTCTGTGCAA 


300 


GACAGGG6AC 


TGCGGCACAA 


CCTTACTGGT 


ACTTCGATGT 


CTGGGGCCAA 


GGGACCACGG 


360 


TCACCGTCTC 


CTCAGGTGGA 


GGCGGTTCAG 


GCGGAGGTGG 


CTCTGGCGGT 


GGCGGATCGG 


420 


ACATCGAGCT 


CACCCAGTCT 


CCAAAATCCA 


TGTCCATGTC 


CGTA6GA6AG 


AGGGTCACCT 


480 


TGAGCTGCAA 


GGCCAGTGAG 


ACTGTGGATT 


CTTTTGTGTC 


CTGGTATCAA 


CAGAAACCAG 


540 


AACAGTCTCC 


TAAATTGTTG 


ATATTCGGGG 


CATCCAACCG 


GTTCAGTGGG 


GTCCCCGATC 


600 


GCrrCACTGG 


CAGTGGATCT 


GCAACAGACT 


TCACTCTGAC 


CATCAGCAGT 


GTGCAGGCTG 


660 


AGGACTTTGC 


GGATTACCAC 


TGTGGACAGA 


CTTACAATCA 


TCCGTATACG 


TTCGGAGGGG 


720 


GGACCAAGCT 


CGAG 










734 
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(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS; ' 

(A) LENGTH: 2685 base pairs 

(B) TYPEi nucleic acid 

(C) STRANDBDNESS : double 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Saccharomyces cerevislae 

(vii> IMMEDIATE SOURCE: 
{B) CLONE: pYYlOB 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 1..2685 

{D) OTHER INFORMATION: /product- "Flocculation protein" 
/gene= "FLOl" 

{xi) SEQUENCE DESCRXPTIONi SEQ ID NO: 14: 
ATG ACA ATG CCT CAT CGC TAT ATG TTT TTG GCA GTC TTT ACA CTT CTG 48 

Met Thr Met Pro His Arg Tyr Met Phe Leu Ala Val Phe Thr I«eu Leu 
15 10 IS 

GCA CTA ACT AGT GTG GCC TCA GGA GCC ACA GAG GCG TGC TTA CSCA GCA 96 
Ala Leu Thr Ser Val Ala Ser 61y Ala Thr Glu Ala Cys Leu Pro Ala 

20 25 30 

GGC CAG AGG AAA AGT GGG ATG AAT ATA AAT TTT TAG CAG TAT TCA TTG 144 

Gly Gin Arg Lys Ser Gly Met Asn lie Asn Phe Tyr Gin Tyr Ser Leu 
3B 40 4S 

AAA GAT TCC TCC ACA TAT TOG AAT GCA GCA TAT ATG 6CT TAT GGA TAT 192 
Lys Asp Ser ser Thr Tyr Ser Asn Ala Ala Tyr Met Ala Tyr Gly Tyr 
50 55 60 

GCC TCA AAA ACC AAA CTA GGT TCT GTC GGA GGA CAA ACT GAT ATC TCG 240 
Ala Ser Lys Thr Lys Leu Gly Ser Val Gly Gly Gin Thr Asp lie Ser 
65 70 75 80 

ATT GAT TAT AAT ATT CCC TGT GTT AGT TCA TCA GGC ACA TTT CCT TGT 288 
lie Asp Tyr Asn lie Pro Cys Val Ser Ser Ser Gly Thr Phe Pro Cys 

85 90 95 

CCT CAA GAA GAT TCC TAT GGA AAC TGO GGA TGC AAA GGA ATG GGT GOT 336 
Pro Gin Glu Asp Ser Tyr Gly Asn Trp Gly Cys Lys Gly Met Gly Ala 

100 105 110 

TGT TCT AAT AGT CAA GGA ATT GCA TAC TGG AGT ACT GAT TTA TTT GGT 384 
Cys Ser Asn Ser Gin Gly lie Ala Tyr Trp Ser Thr Asp Leu Phe Gly 
115 120 125 

TTC TAT ACT ACC CCA ACA AAC GTA ACC CTA GAA ATG ACA GGT TAT TTT 432 
Phe Tyr Thr Thr Pro Thr Asn Val Thr Leu Glu Met Thr Gly Tyr Phe 
130 135 140 

TTA CCA CCA CAG ACG GGT TCT TAC ACA TTC AAG TTT GCT ACA GTT GAC 480 
Leu Pro Pro Gin Thr Gly ser Tyr Thr Phe Lys Phe Ala Thr Val Asp 
145 150 155 160 



GAC TCT GCA ATT CTA TCA GTA GGT GGT GCA ACC GCG TTC AAC TGT TGT 
Asp Ser Ala lie Leu Ser Val Gly Gly Ala Thr Ala Phe Asn Cys Cys 

165 170 175 



528 
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GCT CAA CAG CAA CCG CCG ATC ACA TCA ACG AAC TTT ACC ATT GAC GGT 576 
Ala Gin Gin Gin Pro Pro lie Thr Ser Thr Asn Phe Thr lie Asp Gly 

180 185 190 

ATC AAG CCA TGG GGT GGA ACT TTG CCA CCT AAT ATC GAA GGA ACC GTC 624 
lie Lys Pro Trp Gly Gly ser Leu Pro Pro Asn lie Glu Gly Thr Val 
195 200 205 

TAT ATG TAG GCT GGC TAC TAT TAT CCA ATG AAG GTT GTT TAG TCG AAC 672 
Tyr Met Tyr Ala Gly Tyr Tyr Tyr Pro Met Lys Val Val Tyr Ser Asn . 
210 215 220 

GCT GTT TCT TGG GGT ACA CTT CCA ATT AGT GTG ACA CTT CCA GAT GGT 720 
Ala Val Ser Trp Gly Thr Leu Pro lie Ser Val Thr leu Pro Asp Gly 
225 230 235 240 

ACC ACT GTA AGT GAT GAC TTC GAA GGG TAC GTC TAT TCC TTT GAC GAT 768 

Thr Thr Val Ser Asp Asp Phe Glu Gly Tyr Val Tyr Ser Phe Asp Asp 

245 250 255 

GAC CTA AGT CAA TCT AAC TGT ACT GTC CCT GAC CCT TCA AAT TAT GCT 816 
Asp Leu ser Gin Ser Asn Cys Thr Val Pro Asp Pro Ser Asn Tyr Ala 

260 265 270 

CfTC AGT ACC ACT ACA ACT ACA ACG GAA CCA TGG ACC GCT ACT TTC ACT 864 
Val Ser Thr Thr Thr Thr Thr Thr Glu Pro Trp Thr Gly Thr Phe Thr 
275 280 285 

TCT ACA TCT ACT GAA ATG ACC ACC GTC ACC GGX ACC AAC GGC GTT CCA 912 
Ser Thr Ser Thr Glu Met Thr Thr Val Thr Gly Thr Asn Gly Val Pro 
290 295 300 

ACT GAC GAA ACC GTC ATT GTC ATC AGA ACT CCA ACC AGT GAA GGT CTA 960 
Thr Asp Olu Thr Val lie Val He Arg Thr Pro Thr Ser Glu Gly Leu 
305 310 315 320 

ATC AGC ACC ACC ACT GAA CCA TGG ACT GGC ACT TTC ACT TCG ACT TCC 1008 
lie Ser Thr Thr Thr Glu Pro Trp Thr Gly Thr Phe Thr Ser Thr Ser 

325 330 335 

ACT GAG GTT ACC ACC ATC ACT GGA ACC AAC GGT CAA CCA ACT GAC GAA 1056 
Thr Glu Val Thr Thr lie Thr Gly Thr Asn Gly Gin Pro Thr Asp Glu 

340 345 350 

ACT GTG ATT GTT ATC AGA ACT CCA ACC AGT GAA GGT CTA ATC AGC ACC 1104 
Thr Val lie Val lie Arg Thr Pro Thr Ser Glu Gly Leu lie Ser Thr 
355 360 365 

ACC ACT CAA CCA TGG ACT GGT ACT TTC ACT TCT ACA TCT ACT GAA ATG 1152 
Thr Thr Glu Pro Trp Thr Gly Thr Phe Thr Ser Thr Ser Thr Glu Met 
370 375 380 

ACC ACC GTC ACC GGT ACT AAC GGT CAA CCA ACT GAC GAA ACC GTG ATT 1200 
Thr Thr Val Thr Gly Thr Asn Gly Gin Pro Thr Asp Glu Thr Val He 
385 390 395 400 

GTT ATC AGA ACT CCA ACC AGT GAA GGT TTG GTT ACA ACC ACC ACT GAA 1248 
Val lie Arg Thr Pro Thr Ser Glu Gly Leu Val Thr Thr Thr Thr Glu 

405 410 415 

CCA TGG ACT GGT ACT TTT ACT TCG ACT TCC ACT GAA ATG TCT ACT GTC 1296 
Pro Trp Thr Gly Thr Phe Thr Ser Thr Ser Thr Glu Met Ser Thr Val 

420 425 430 

ACT GGA ACC AAT GGC TTG CCA ACT GAT GAA ACT GTC ATT GTT GTC AAA 1344 
Thr Gly Thr Asn Gly Leu Pro Thr Asp Glu Thr Val He Val Val Lys 
435 440 445 
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ACT CCA ACT ACT GCC ATC TCA TCC ACT TTG TCA TCA TCA TCT TCA GGA 1392 
Thr Pro Thr Thr Ala lie Ser Ser Ser Leu Ser Ser Ser Ser Ser Gly 
450 455 460 

CAA ATC ACC AGC TCT ATC ACG TCT TCG CGT CCA ATT ATT ACC CCA TTC 1440 
Gin He Thr Ser Ser He Thr Ser Ser Arg Pro He He Thr Pro Pbe 
465 470 475 480 

TAT CCT AGC AAT GGA ACT TCT GTG ATT TCT TCC TCA GTA ATT TCT TCC 1488 
Tyr Pro Ser Asn Gly Thr Ser Val He Ser Ser Ser Val He Ser Ser 

485 490 49S 

TCA GTC ACT TCT TCT CTA TTC ACT TCT TCT CCA GTC ATT TCT TCC TCA 1536 

Ser Val Thr Ser Ser Leu Phe Thr Ser Ser Pro Val He ser Ser Ser 

500 SOS 510 

GTC ATT TCT TCT TCT ACA ACA ACC TCC ACT TCT ATA TTT TCT GAA TCA 1584 
Val He Ser Ser Ser Thr Thr Thr Ser Thr Ser He Phe ser clu Ser 
515 S20 525 

TCT AAA TCA TCC GTC ATT CCA ACC AGT AGT TCC ACC TCT GOT TCT TCT 1632 
Ser Lys Ser Ser Val He Pro Thr Ser Ser Ser Thr Ser Gly Ser Ser 
530 535 540 

GAG AGC GAA ACG AGT TCA GCT GGT TCT GTC TCT TCT TCC TCT TTT ATC 1680 

Glu Ser Glu Thr Ser Ser Ala Gly Ser Val Ser Ser Ser ser Phe He 
545 550 555 560 

TCT TCT GAA TCA TCA AAA TCT CCT ACA TAT TCT TCT TCA TCA TTA CCA 1728 
Ser Ser Glu Ser Ser Lys Ser Pro Thr Tyr Ser Ser Ser Ser Z,eu Pro 

565 570 575 

CTT GTT ACC AGT GCG ACA ACA AGC CAG GAA ACT GCT TCT TCA TTA CCA 1776 
Leu Val Thr Ser Ala Thr Thr Ser Gin Glu Thr Ala Ser Ser Leu Pro 

580 585 590 

CCT GCT ACC ACT ACA AAA ACG AGC GAA CAA ACC ACT TTG GTT ACC GTG 1824 
Pro Ala Thr Thr Thr Lys Thr Ser Glu Gin Thr Thr Leu Val Thr Val 
595 600 605 

ACA TCC TGC GAG TCT CAT GTG TGC ACT GAA TCC ATC TCC CCT GCG ATT 1872 
Thr Ser Cys Glu Ser His Val Cys Thr Glu Ser He Ser Pro Ala He. 
610 615 620 

GTT TCC ACA GCT ACT GTT ACT GTT AGC GGC GTC ACA ACA GAG TAT ACC 1920 
Val Ser Thr Ala Thr Val Thr Val Ser Gly Val Thr Thr Glu Tyr Thr 
625 630 635 640 

ACA TGG TCC CCT ATT TCT ACT ACA GAG ACA ACA AAG CAA ACC AAA GGG 1968 

Thr Trp Cys Pro He ser Thr Thr Glu Thr Thr Lys Gin Thr Lys Gly 

645 650 655 

ACA ACA GAG CAA ACC ACA GAA ACA ACA AAA CAA ACC ACG GTA GTT ACA 2016 
Thr Thr Glu Gin Thr Thr Glu Thr Thr Lys Gin Thr Thr Val Val Thr 

660 665 670 

ATT TCT TCT TGT GAA TCT GAC GTA TGC TCT AAG ACT GCT TCT CCA GCC 2064 
He Ser Ser Cys Glu Ser Asp Val Cys Ser Lys Thr Ala Ser Pro Ala 
675 680 685 

ATT GTA TCT ACA AGC ACT GCT ACT ATT AAC GGC GTT ACT ACA GAA TAC 2112 
He Val Ser Thr Ser Thr Ala Thr He Asn Gly Val Thr Thr Glu Tyr 
690 695 700 

ACA ACA TGG TGT CCT ATT TCC ACC ACA GAA TCG AGG CAA CAA ACA ACG 2160 
Thr Thr Trp Cys Pro He Ser Thr Thr Glu Ser Arg Gin Gin Thr Thr 
705 710 715 720 
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CTA GTT ACT GTT ACT TCC TCC GAA TCT GGT GTG TGT TCC GAA ACT GCT 
Leu Val Thr Val Thr Ser cys Glu Ser Gly Val Cys Ser Glu Thr Ala 

725 730 735 



22 



TCA CCT GCC ATT GTT TCG ACG GCC ACG GCT ACT GTG AAT GAT GTT GTT 
Ser Pro Ala He Val Ser Thr Ala Thr Ala Thr Val Asn Asp Val Val 

740 745 750 



22 



ACG GTC TAT CCT ACA TGG AGG CCA CAG ACT GCG AAT GAA GAG TCT GTC 
Thr Val Tyr Pro Thr Trp Arg Pro Gin Thr Ala Asn Glu Glu Ser Val 
75B 760 765 



23 



AGC TCT AAA ATG AAC AGT GCT ACC GGT GAG ACA ACA ACC AAT ACT TTA 
Ser Ser Lys Het Asn Ser Ala Thr Gly Glu Thr Thr Thr Asn Thr Leu 
770 77S 780 



23 



GCT GCT GAA ACG ACT ACC AAT ACT GTA GCT GCT GAG ACG ATT ACC AAT 

Ala Ala Glu Thr Thr Thr Asn Thr Val Ala Ala Glu Thr He Thr Asn 
785 790 795 800 



24 



ACT GGA GCT GCT GAG ACG AAA ACA GTA GTC ACC TCT TCG CTT TCA AGA 

Thr Gly Ala Ala Glu Thr Lys Thr Val Val Thr Ser Ser Leu Ser Arg 

805 810 815 



24 



TCT AAT CAC GCT GAA ACA CAG ACG GCT TCC GCG ACC GAT GTG ATT GGT 
Ser Asn His Ala Glu Thr Gin Thr Ala Ser Ala Thr Asp Val He Gly 

820 325 830 

CAC AGC AGT AGT GTT GTT TCT GTA TCC GAA ACT GGC AAC ACC AAG AGT 
His Ser Ser Ser Val Val Ser Val Ser Glu Thr Gly Asn Thr Lys Ser 
835 840 845 

CTA ACA AGT TCC GGG TTG AGT ACT ATG TCG CAA CAG CCT CGT AGC ACA 
ItGu Thr Ser Ser Gly heu Ser Thr Me-t ser Gin Gin Pro Arg Ser Thr 
8S0 855 860 

CCA GCA AGC AGC ATG GTA GGA TAT AGT ACA GCT TCT TTA GAA ATT TCA 
Pro Ala Ser Ser Met Val Gly Tyr Ser Thr Ala Ser Zreu Glu He Ser 
665 870 875 880 

ACG TAT GCT GGC AGT GCA ACA GCT TAC TGG CCG GTA GTG GTT TAA 

Thr Tyr Ala Gly Ser Ala Thr Ala Tyr Trp Pro val Val Val 

685 890 895 

(2) INFORMATION FOR SEQ ID NO: IS: 

ti) SEQUENCE CHARACTERISTICS: 
{A> LENGTH: 894 amino acids 
(B) TyPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Met Thr Met Pro His Arg Tyr Met Phe Leu Ala Val Phe Thr Leu Leu 
15 10 15 

Ala Leu Thr Ser Val Ala Ser Gly Ala Thr Glu Ala Cys Leu Pro Ala 

20 25 30 



24* 



25' 



251 



26i 



261 



Gly Gin Arg Lys Ser Gly Met Asn He Asn Phe Tyr Gin Tyr Ser Leu 
35 40 45 



Lys Asp Ser Ser Thr Tyr Ser Asn Ala Ala Tyr Met Ala Tyr Gly Tyr 
50 55 60 
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Ala Ser Lys Thr Lys Leu Gly ser Val Gly Gly Gin Thr Asp He Ser 
65 70 75 80 

He Asp Tyr Asn He Pro Cys Val Ser Ser Ser Gly Thr Phe Pro Cys 

85 90 95 

Pro Gin Glu Asp Ser Tyr Gly Asn Trp Gly Cys Lys Gly Met Gly Ala 

100 105 110 

Cys Ser Asn Ser Gin Gly He Ala Tyr Trp Ser Thr Asp Leu Phe Gly 
115 120 125 

Phe Tyr Thr Thr Pro Thr Asn Val Thr Leu Glu Met Thr Gly Tyr Phe 
130 135 140 

Leu Pro Pro Gin Thr Gly Ser Tyr Thr Phe Lys Phe Ala Thr Val Asp 
145 150 155 160 

Asp Ser Ala He Leu Ser Val Gly Gly Ala Thr Ala Phe Asn Cys Cys 

165 170 175 

Ala Gin Gin Gin Pro Pro He Thr Ser Thr Asn Phe Thr He Asp Gly 

180 IBS 190 

He Lys Pro Trp Gly Gly Ser Leu Pro Pro Asn He Glu Gly Thr Val 
195 2O0 205 

Tyr Met Tyr Ala Gly Tyr Tyr Tyr Pro Met Lys Val Val Tyr Ser Aan 
210 215 220 

Ala Val Ser Trp Gly Thr Leu Pro lie Ser Val Thr Leu Pro Asp Gly 
225 230 235 240 

Thr Thr Val Ser Asp Aep Phe Glu Gly Tyr Val Tyr Ser Phe Asp Aep 

245 250 255 

Asp Leu Ser Gin Ser Asn cys Thr Val Pro Asp Pro ser Asn Tyr Ala 

260 265 270 

Val Ser Thr Thr Thr Thr Thr Thr Glu Pro Trp Thr Gly Thr Phe Thr 
275 280 285 

ser Thr Ser Thr Glu Met Thr Thr Val Thr Gly Thr Asn Gly Val Pro 
290 295 300 

Thr Asp Glu Thr Val He Val He Arg Thr Pro Thr ser Glu Oly Leu 
305 310 315 320 

He ser Thr Thr Thr Glu pro Trp Thr Gly Thr Phe Thr Ser Thr Ser 

325 330 335 

Thr Glu Val Thr Thr He Thr Gly Thr Asn Gly Gin Pro Thr Asp Glu 

340 345 350 

Thr Val He Val He Arg Thr Pro Thr ser Glu Gly Leu He Ser Thr 
355 360 365 

Thr Thr Glu Pro Trp Thr Gly Thr Phe Thr Ser Thr Ser Thr Glu Met 
370 375 380 

Thr Thr Val Thr Gly Thr Asn Gly Gin Pro Thr Asp Glu Thr Val He 
385 390 395 400 

Val He Arg Thr Pro Thr Ser Glu Gly Leu Val Thr Thr Thr Thr Glu 

405 410 415 
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Pro Trp Thr Gly 

420 

Thr Gly Thr Asn 
435 

Thr Pro Thr Thr 

450 



Gin lie Thr Ser 
465 

Tyr Pro Ser Asn 



Ser Val Thr ser 

500 

Val lie Ser Ser 
515 



Ser Lys Ser Ser 

530 

Glu Ser Glu Thr 
545 

Ser Ser Glu Ser 



Leu Val Thr. Ser 

580 



Pro Ala Thr Thr 
595 

Thr Ser Cys Glu 
610 

Val Ser Thr Ala 
625 

Thr Trp Cys Pro 



Thr Thr Glu Gin 

660 

lie Ser Ser Cys 
675 

lie Val Ser Thr 
690 

Thr Thr Trp Cys 
705 

Leu Val Thr Val 



Ser Pro Ala He 

740 

Thr Val Tyr Pro 
755 



Thr Phe Thr Ser 



Gly Leu Pro Thr 

440 

Ala He Ser Ser 
455 

Ser He Thr Ser 
470 

Gly Thr ser Val 
485 



Ser Leu Phe Thr 



Ser Thr Thr Thr 

520 

Val He Pro Thr 
535 

Ser Ser Ala Gly 
S50 

Ser Lys Ser Pro 
565 

Ala Thr Thr Ser 



Thr Lys Thr ser 

600 

ser His Val Cys 
615 

Thr Val Thr Val 
630 



He Ser Thr Thr 
645 



Thr Thr Glu Thr 



Glu Ser Asp Val 

680 

Ser Thr Ala Thr 
695 



Pro He Ser Thr 
710 

Thr Ser Cys Glu 
725 

Val Ser Thr Ala 



Thr Trp Arg Pro 

760 



Thr Ser Thr Glu 
425 

Asp Glu Thr val 



Ser Leu Ser Ser 

460 



Ser Arg Pro He 
475 



He Ser Ser Ser 
490 



Ser Ser Pro Val 
505 



Ser Thr Ser He 



Ser ser Ser Thr 

540 

Ser Val Ser ser 
555 

Thr Tyr ser ser 
570 

Gin Glu Thr Ala 
585 



Glu Gin Thr Thr 



Thr Glu Ser He 

620 

Ser Gly Val Thr 
635 

Glu Thr Thr Lys 
650 



Thr Lys Gin Thr 
665 



Cys Ser Lys Thr 



He Asn Gly Val 

700 

Thr Glu Ser Arg 
715 



Ser Gly Val Cys 
730 

Thr Ala Thr Val 
745 

Gin Thr Ala Asn 



Met Ser Thr Val 
430 

He Val Val Lys 
445 

Ser Ser Ser Gly 



He Thr Pro Phe 

430 

Val He Ser Ser 
495 

He Ser Ser Ser 
510 



Phe Ser Glu Ser 

525 



Ser Gly Ser Ser 



Ser Ser Phe He 

560 

Ser ser Leu Pro 
575 



Ser Ser Leu Pro 
590 



Leu Val Thr Val 
605 

Ser Pro Ala Zle 



Thr Glu Tyr Thr 

640 

Gin Thr Lye Gly 
655 

Thr Val Val Thr 
670 

Ala ser Pro Ala 
685 



Thr Thr Glu Tyr 



Gin Gin Thr Thr 

720 

Ser Glu Thr Ala 
735 

Asn Asp Val Val 
750 

Glu Glu Ser Val 
765 



1 



wo 94/18330 



42 



PCT/EP94/00427 



Ser Ser Lys Met Asn Ser Ala Thr cly Glu Thr Thr Thr Asn Thr Leu 
770 775 780 

Ala Ala Glu Thr Thr Thr Asn Thr Val Ala Ala Glu Thr lie Thr Asn 
785 790 795 800 

Thr Gly Ala Ala Glu Thr Lys Thr Val Val Thr Ser Ser Leu Ser Arg 

805 810 815 

Ser Asn His Ala Glu Thr Gin Thr Ala Ser Ala Thr Asp Val He Gly 

820 625 B30 

His Ser Ser Ser Val Val Ser Val Ser Glu Thr Gly Asn Thr Lys Ser 
835 840 845 

Leu Thr Ser Ser Gly Leu Ser Thr Met Ser Gin Gin Pro Arg Ser Thr 
850 855 860 

Pro Ala Ser Ser Met Val Gly Tyr Ser Thr Ala Ser Leu Glu He Ser 
665 870 675 880 

Thr Tyr Ala Gly Ser Ala Thr Ala Tyr Trp Pro Val Val Val 

885 890 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 
{A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(11) MOLECULE TYPE: ONA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CIONE: ChoB template coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

GCCCCCAGCC GCACCCTCG 19 

(2) INFORMATION FOR SEQ ID NO: 17: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA <genomic) 

(vii) IMMEDIATE SOURCE; 

(B) CLONE: ChoB template non-coding strand 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
CGAGGGTGCG GCTGGGGGC 19 
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(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

( B ) TYPE : nucleic ac id 

(C) STRANDEDMESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B ) CLONE : choOlpcr primer 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: IS: 

AGATCTGAAT TCGCGGCCGC CCCCAGCCGC ACCCTCG 

<2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 
<B) TYPE: nucleic acid 
jc) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: ONA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: cho02pcr primer 

(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
AOATCTAAGC TTTCAGCTAG CCTGGATGTC 6GAC6AGATG AT 
(2) INFORMATION FOR SCQ ID NO: 20; 

(1) SEQUENCS CHARACTSRISTICS; 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLSCULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: ChoB template coding strand 

(Xi) SEQUENCE DESCRIPTION: SEQ ID MO: 20: 

ATCATCTCGT CCGACATCCA G 

(2> INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(Vii) IMMEDIATE SOURCE: 

(B) CLONE: ChoB template non-coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

CTCGATGTCG GACGAGATGA T 
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(2) INFORMATION FOR SEQ ID NO: 22 i 

(i) SEQUENCE CHARACTERISTICS: 

(A) XtEIiGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOCr: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: mutagenesis primer ChoB 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

CGCGGCGACG GCACCGCCGT ATGCACTGGC GATGACGAGG GC 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 
(A> LENGTH: 42 base pairs 
(B> TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B> CLONE: ChoB tiemplate coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID MO: 23: 

GCCCTCGTCA TCGGCAGTGG ATACGGCGGT GCCGTCGCCG CG 

<2) INFORMATION FOR SEQ ID NO; 24: 

(1) SEQUENCE CHARACTERISTICS: 
(A> LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(D> TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: primer prtl 

(xi) SEQUENCE DESCRIPTION: SEQ ID MO: 24: 

AAGATCTATC GATCTTGTTA GCCGGTACA 

(2) INFORMATION FOR SEQ ID KG: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 
{C| STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE; proteinase template non -coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

GACTGTACCG GCTAACAAGA TCGATAGCCC TT 
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(2) INFORMATION PGR 5EQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 27 base pairs 
(BJ TYPE: nucleic acid 

(C) STRANOEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(Vii) IMMEDIATE SOURCE: 

(B) CLONE: proteinase template coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
GTCGGCGAAA TCCAA6CAAA GGCGGCT 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEONESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(vii) IMMEDIATE SOURCE: 

(B) CLONE: prt2 primer 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
CCCAAGCTTC CCCCC6GCCG TTGCTTGGAT TTCGCCGAC 
(2) XNPORMATZON FOR SEQ XD NO: 28 t 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANOEDNESS: single 
(D> TOPOLOGY: linear 

<ii> MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: EGFl primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

GGGGCGGCCG CGCTGGAGGA AAAGAAAGTT TGC 

<2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: EGP receptor template non-coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

GCAAACTTTC TTTTCCTCCA GAGCCCGACT CGC 
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(2) INFORMATION FOR 5EQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

{B) CLONE: EGF receptor template coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID WO: 30: 

AATGGGCCTA AGATCCCGTC CATCGCCACT 30 

<2> INFORMRTIOM FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: EGF2 primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
CCCCAAGCTT AAGGCTAGCG GACGGGATCT TAGGCCCATT 40 
(2> INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LEIfGTH: 177 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: VhC - AGorl linker with MycT and Hinge 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 32: 
GAATTCCAGG TCACCGTCTC CTCAGAACAA AAACTCATCT CAGAAGAGGA TCTGAATGAA 60 
CCAAAGATTC CACAACCTCA ACCAAAGCCA CAACCTCAAC CACAACCACA ACCAAAACCT 120 
CAACCAAAGC CAGAACCAGA ATCTACTTCC CCAAAGTCTC CAGCTAGCCT TAAGCTT 177 
<2) INFORMATION FOR SBQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 63 base pairs 
(BJ TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: VhC - AGal linker with MycT 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
GAATTCCAGG TCACCGTCTC CTCAGAACAA AAACTCATCT CAGAAGAGGA TCTGAATGCT 60 
AGC S3 
(2) INFORHAXION FOR SEQ ID NO: 34: 

(1) SEQUENCE CHAItACTERISTICS: 

(A) LENGTH: 144 base pairs 

(B) TTiPEt nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOI.OGY; linear 

(il) MOLECULE TYPE: DNA (genomic) 
(vii) IMMEDIATE SOURCE: 

(B) CLONE: VhC - AGal linker with Hinge 
(xi> SEQUENCE DESCRIPTION: SEQ ID MO: 34: 
GAATTCCAGG TCACCGTCTC CTCAGAACCA AAGATTCCAC AACCTCAACC AAAGCCACAA 60 
CCTCAACCAC AACCACAACC AAAACCTCAA CCAAAGCCAG AACCAGAATC TACTTCCCCA 120 
AAGTCTCCAG CTAGCCTTAA GCTT 144 
(2) INFORMATION FOR SEQ Z]> NO: 35: 

(1) SEQXJENCE CHARACTERISTICS: 

(A) LENGTH: 119 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(3) CLONE: fragment in pUR4421 coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

AATTTAGCGG CCGCCCAGGT GAAACTGCTC GAGTAAGTGA CTAA6GTCAC CGTCTCCTCA 60 

GAACAAAAAC TCATCTCAGA AGAGGATCTG AATTAATGAG AATTCATCAA ACGGT6ATA 119 

(2) INFORMATION FOR SEQ ID NO: 36: 

(1) SEQUENCE CHARACTERISTICS: \ 

(A) LENGTH: 119 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: fragment in pUR4421 non-coding strand 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 36: 

AGCTTATCAC CGTTTGATGA ATTCTCATTA ATTCAGATCC TCTTCTGAGA TGAGTTTTTG 60 

TTCTGAGGAG ACGGTGACCT TAGTCACTTA CTCGAGCAGT TTCACCTGGG CGGCCGCTA 119 
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(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: Myc tail 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

Glu Gin Lys Leu lie Ser Glu Glu Asp Leu Asn 
15 10 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(iij MOLECULE TYPE: DNA (genomic) 

<vii) IMMEDIATE SOURCE: 

(B) CLONE: BstEII-Hindlll linker coding strand 

(xl) SEQUENCE DESCRIPTION; SEQ ID NO 3 385 

GTCACCGTCT CCTCATAATG A 

(2) INFORMATION FOR SEQ ID NO: 39 S 

(1) SEQUENCE CHARACT£RISTICS: 
(A> LENGTH: 20 base pairs 
(B> TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: BstEtl Hindlll linker non-coding strand 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

AGCTTCATTA TGAGGAGACG 

(2) INFORMATION FOR SEQ lO NO: 40: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(Vii) IMMEDIATE SOURCE: (B) CLONE: primer cho03pcr 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
CGGATCCAAG CTTGAGCCTG GATGTCGGAC GAGATGAT 
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CLAIMS 



1. A method for immobilizing a binding protein capable of binding to a spe- 
cific compound, comprising the use of recombinant DNA techniques for producing 
said binding protein or a funciional part thereof still having said specific binding 
capability, said protein or said part thereof being linked to the outside of a host cell, 
whereby said binding protein or said part thereof is localized in the cell wall or at 
the exterior of the cell wall by allowing the host cell to produce and secrete a 
chimeric protein in which said binding protein or said functional part thereof is 
bound with its C-terminus to the N -terminus of an anchoring part of an anchoring 
protein capable of anchoring in the cell wall of the host cell, which anchoring part is 
derivable from the C-terminal part of said anchoring protein. 

2. The method of claim 1, in which the host is selected from the group 
consisting of Gram-positive bacteria and fungi. 

3. The method of claim 2, in which the host is a Gram-positive bacterium 
selected from the group consisting of lactic add bacteria, and bactena belonging to 
the genera Bacillus and Streptomyces. 

4. The method of claim 2, in which the host is a fungus selected from the 

group consisting of yeasts belonging to the genera Candida^ Debwyomyces^ Han- 
senula, Kluyveromyces^ Fichia and SacdmromyceSt and moulds belonging to the 
genera Aspergillus, Penicilliuin and Rhizopus, 



5. A recombinant polynucleotide comprising 

(i) a structural gene encoding a binding protein or a functional part thereof 
still having the specific binding capability, and 

(ii) at least pan of a gene encoding an anchoring protein capable of anchoring 
in the cell wall of a Gram-positive bacterium or a fungus, said part of a 
gene encoding at least the anchoring part of said anchoring protein, which 
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anchoring pan is derivable from the C-terminal part of said anchoring 
protein. 

6. The polynucleotide of claim 5, wherein the anchoring protein is selected 
from the group consisting of a-agglutinin, a-agglutinin» FL01» the Major Cell Wall 
Protein of a fungus, and proteinase of lactic acid bacteria. 

7. The polynucleotide of claim 5, further comprising a nucleotide sequence 
encoding a signal peptide ensuring secretion of the expreswiion product of the 
polynucleotide, 

8. The polynucleotide of claim 7, wherein the signal peptide is derived from a 
protein selected from the group consisting of the a -mating factor of yeast, a-agglu- 
tinin of yeast, inveriase of Saccharomyces^ inulinase of Kluyveromyces^ o-amyJase of 
BaciUus^ and pTOt&\x\SiS& of lactic acid bacteria. 

9. The polynucleotide of any of claims 5-8, operably linked to a promoter, 
which can be an inducible promoter. 

10. A recombinant vector comprising a polynucleotide as claimed in any of 
claims 5-9, 

11. A chimeric protein encoded by a polynucleotide as claimed in any of 
claims 5-9. 

12. A host cell having a cell wall at the outside of its cell and containing at 
least one polynucleotide as claimed in any of claims 5-9. 

13. The host cell of claim 12, having at least one polynucleotide as claimed in 
any of claims 5-9 integrated in its chromosome. 
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14. A host cell having a chimeric protein as claimed in claim 11 immobilized 
in its cell wall and having the binding protein pan of the chimeric protein localized 
in the cell wall or ut the exterior of the cell wall. 

15. The host cell of any of claims 12-14» which is a fungus selected from the 
group consisting of yea-*;ts and moulds. 

16. A process for carrying out an isolation process by using an immobilized 
binding protein or functional part thereof stiU capable of binding to a specific 
compound, wherein a medium containing said specific compound is contacted with a 
host cell as claimed in any of claims 12-15 under conditions whereby a complex 
between said specific compound and said immobilized binding protein is formed, 
separating said complex from the medium originally containing said specific 
compound and, optionally* releasing said specific compound from said binding 
protein or functional part thereof. 
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